Closing the Train-Test Gap in World Models for Gradient-Based Planning

AI-generated keywords: Robotic tasks agent's actions environment impact prediction and planning world models

AI-generated Key Points

  • Understanding how an agent's actions impact the environment is crucial for prediction and planning in robotic tasks.
  • Traditional methods use analytical models derived from known principles, while learning-based approaches directly infer models from data.
  • World models predict the next state based on the current state and an action, enhancing planning with model predictive control (MPC).
  • Gradient-based planning offers a computationally efficient alternative to traditional MPC methods but historically lags behind other approaches in performance.
  • Enhanced training methods for world models improve gradient-based planning efficiency by bridging the train-test gap through data synthesis techniques.
  • The proposed approach outperforms or matches classical gradient-free cross-entropy method (CEM) in various object manipulation and navigation tasks within a 10% time budget.
  • By optimizing training strategies and closing the train-test gap, improved performance is demonstrated in gradient-based planning with world models, advancing robotic task anticipation and planning efficiency.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Arjun Parthasarathy, Nimit Kalra, Rohun Agrawal, Yann LeCun, Oumayma Bounou, Pavel Izmailov, Micah Goldblum

License: CC BY 4.0

Abstract: World models paired with model predictive control (MPC) can be trained offline on large-scale datasets of expert trajectories and enable generalization to a wide range of planning tasks at inference time. Compared to traditional MPC procedures, which rely on slow search algorithms or on iteratively solving optimization problems exactly, gradient-based planning offers a computationally efficient alternative. However, the performance of gradient-based planning has thus far lagged behind that of other approaches. In this paper, we propose improved methods for training world models that enable efficient gradient-based planning. We begin with the observation that although a world model is trained on a next-state prediction objective, it is used at test-time to instead estimate a sequence of actions. The goal of our work is to close this train-test gap. To that end, we propose train-time data synthesis techniques that enable significantly improved gradient-based planning with existing world models. At test time, our approach outperforms or matches the classical gradient-free cross-entropy method (CEM) across a variety of object manipulation and navigation tasks in 10% of the time budget.

Submitted to arXiv on 10 Dec. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2512.09929v1

In the field of robotic tasks, understanding how an agent's actions impact the environment is crucial for prediction and planning. Traditional methods rely on analytical models derived from known principles, while learning-based approaches use data to directly infer models. This allows for a better capture of complex dynamics and increased robustness to uncertainty. World models have emerged as a powerful tool in this regard, predicting the next state based on the current state and an action. This paper focuses on improving gradient-based planning with world models paired with model predictive control (MPC). While traditional MPC methods can be slow due to search algorithms or iterative optimization, gradient-based planning offers a more computationally efficient alternative. However, the performance of gradient-based planning has historically lagged behind other approaches. The authors propose enhanced training methods for world models to enable more efficient gradient-based planning. They highlight that while world models are typically trained on next-state prediction objectives, they are used at test-time to estimate sequences of actions. The goal is to bridge this train-test gap by introducing train-time data synthesis techniques that significantly enhance gradient-based planning using existing world models. The results show that their approach outperforms or matches the classical gradient-free cross-entropy method (CEM) across various object manipulation and navigation tasks within a 10% time budget. By closing the train-test gap and optimizing training strategies, the authors demonstrate improved performance in gradient-based planning with world models. This showcases advancements in robotic task anticipation and planning efficiency.
Created on 19 Dec. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.