Revolutionary GRASP Planner Makes Long-Horizon Robot Control Feasible with World Models
New Gradient-Based Planner Overcomes Critical Fragility in Learned Dynamics
A team of researchers from Meta and leading universities has unveiled GRASP, a novel gradient-based planner designed to make long-horizon planning with learned world models practical for the first time. The approach directly addresses the brittleness that has plagued high-dimensional visual planning, according to a paper released today.

“GRASP tackles the core ill-conditioning that makes long-horizon optimization in latent spaces fragile,” said lead author Dr. Amir Bar, who conducted the work with Mike Rabbat, Aditi Krishnapriyan, and Yann LeCun. “We’re essentially rewriting how gradients flow through a world model to give actions clean, useful signals.”
Three Key Innovations
The planner introduces three interlocking mechanisms. First, it lifts the trajectory into virtual states, enabling parallel optimization across time steps. Second, it injects stochasticity directly into state iterates to encourage exploration. Third, it reshapes gradients to avoid brittle state-input paths through high-dimensional vision models.
- Parallel virtual states: Optimization runs concurrently across the entire horizon, avoiding sequential bottlenecks.
- Stochastic state updates: Random noise is added to intermediate states during optimization, preventing premature convergence.
- Gradient reshaping: Clean action gradients are computed, bypassing noisy gradients from vision encoders.
These changes allow GRASP to plan effectively over hundreds of steps in learned dynamics, a regime where previous methods failed completely.
Background: The Promise and Pain of World Models
Learned world models—neural networks that predict future observations given actions—have become increasingly powerful. They can simulate long sequences in high-dimensional visual spaces and generalize across tasks. However, using them for control or planning has been notoriously difficult.

“Having a powerful world model is not the same as using it for effective control,” the paper notes. “Long-horizon planning becomes ill-conditioned, non-greedy structure creates bad local minima, and high-dimensional latents introduce subtle failure modes.” Prior approaches either relied on short-term receding horizon control or required heavy compute for Monte Carlo tree search.
GRASP’s gradient-based approach offers a computationally efficient alternative that scales to longer horizons without requiring hand-tuned exploration.
What This Means
The advance could unlock autonomous planning in robotics, gaming, and simulation. For example, a robot with a world model of its environment could plan a multi-step manipulation task—like assembling furniture—without needing to replan every second. The method’s robustness suggests it could replace heuristic planners in many real-world systems.
“This is a significant step toward making learned simulators truly usable for decision-making,” commented co-author Yann LeCun, Meta’s Chief AI Scientist. “GRASP demonstrates that gradient-based planning can be both practical and reliable at scale.”
Future work will focus on integrating GRASP with real-time control loops and extending it to partially observable settings. The source code is publicly available for researchers.