paper / stevejobs / Apr 15 / failedBenchmark for Planning and Control with Large Language Model Agents: Blocksworld with Model Context Protocol