GRASP: Making Long-Horizon Planning with World Models Practical

Introduction

Modern world models—learned simulators that predict future observations from actions—have grown remarkably powerful. They can forecast long sequences in high-dimensional visual spaces and generalize across tasks, resembling general-purpose simulators rather than narrow predictors. However, wielding these models for effective planning remains a challenge, especially over long horizons. The optimization often becomes ill-conditioned, trapped in local minima, or undermined by high-dimensional latent spaces. In this article, we explore a new approach called GRASP (Gradient-based Ascent for Robust Sampling and Planning), which redesigns gradient-based planning to make it robust for long-term decision-making.

GRASP: Making Long-Horizon Planning with World Models Practical — Source: bair.berkeley.edu

The Challenge of Long-Horizon Planning

Why Traditional Planning Fails

Standard gradient-based planners optimize a sequence of actions by backpropagating through a world model. This works well for short horizons but breaks down when the planning horizon extends. The gradient signal becomes noisy or vanishes, the loss landscape develops sharp ravines, and the high dimensionality of latent states amplifies these issues. Moreover, greedy local improvements often overlook strategic long-term consequences.

The Role of World Models

A world model, as we define it, predicts future states given current state and actions. Formally, it approximates P_θ(s_t+1 | s_t−h:t, a_t). These models are typically learned from data and can be used as differentiable simulators for planning. Yet even with accurate models, the planning procedure itself introduces fragility.

GRASP: A Robust Gradient-Based Planner

GRASP tackles these problems with three key innovations that together make gradient-based planning practical for long horizons.

1. Virtual State Lifting

Instead of processing one time step at a time, GRASP lifts the entire trajectory into a set of virtual states—one per future time step—that are optimized in parallel. This parallelization removes the sequential bottleneck and allows the gradient to propagate uniformly across the horizon, avoiding the decay that plagues step-by-step methods.

2. Stochastic Exploration in State Space

To escape poor local minima, GRASP injects controlled stochasticity directly into the state iterates during optimization. This is not noise in the actions, but in the predicted states themselves, which helps the planner explore diverse trajectories and avoid premature convergence.

3. Gradient Reshaping for Clean Action Signals

When gradients flow through high-dimensional vision models, they can become brittle—especially the “state-to-action” gradients. GRASP reshapes these gradients to give clean, actionable signals to the action sequence while bypassing the noisy gradients from pixel-level predictions. This separation stabilizes updates and makes optimization more reliable.

Results and Implications

Empirical Validation

In experiments across several continuous control tasks with visual observations, GRASP consistently outperforms baseline planners, especially as horizon lengths increase. It achieves higher success rates, lower cumulative costs, and better sample efficiency in planning.

Broader Impact

The ability to plan reliably over long horizons opens the door to using learned world models as true simulators for reinforcement learning, robotics, and autonomous systems. GRASP makes it feasible to leverage powerful models without the fragility that previously limited their deployment.

Conclusion

GRASP introduces a principled way to make gradient-based planning robust for long horizons. By virtual state lifting, stochastic exploration, and gradient reshaping, it overcomes the fundamental challenges of optimization in high-dimensional latent spaces. As world models continue to scale, techniques like GRASP will be essential to translate prediction power into effective control. For more details, see the full paper: Gradient-based Planning for World Models at Longer Horizons (with Mike Rabbat, Aditi Krishnapriyan, Yann LeCun, and Amir Bar).

Back to Challenge | Back to GRASP

Tags: