MIT Researchers Develop VLM-Guided Formal Planning

MIT researchers developed VLM-guided formal planning (VLMFP), a two-step generative AI system that uses vision-language models to simulate actions from a single image and then generates PDDL files for classical planners; the paper will be presented at ICLR. VLMFP achieved roughly 70% average success versus ~30% for baselines, generalized to unseen problems, and solved multiple 2D and 3D tasks including multirobot collaboration and robotic assembly.
Key Points
- 1Develops VLMFP two-step pipeline converting images into PDDL files for classical long-horizon planners
- 2Achieves roughly 70% success, outperforming baselines (~30%), by combining VLM perception and formal solvers
- 3Enables generalization to unseen visual planning scenarios, producing usable solver-ready plans for robots
Scoring Rationale
Credible ICLR/MIT research with a novel, promising VLM-to-PDDL pipeline + limited real-world validation and scalability evidence.
Sources
Public references used for this report.
Practice interview problems based on real data
1,625 SQL & Python problems across 15 industry datasets — the exact type of data you work with.
Try 250 free problems
