MPC-RL LogoMPC-RL

Accelerating and Scaling MPC-Guided Reinforcement Learning for Humanoid Locomotion and Manipulation

AMBER Lab, California Institute of Technology    Johns Hopkins University

MPC-guided reinforcement learning for robust humanoid locomotion and manipulation.

Abstract

In humanoid motion control, model predictive control (MPC) offers physically grounded prediction and constraint handling, while reinforcement learning (RL) enables robust whole-body skills through large-scale simulation. However, using MPC inside RL often requires time-consuming problem construction or excessive training overhead, making such frameworks difficult to justify in practice. This work studies efficient training-time MPC guidance for humanoid locomotion and manipulation, termed MPC-RL. We introduce a centroidal-dynamics MPC reward formulation that leverages guidance from MPC trajectories in training time. To make this practical in massively parallel RL, we develop πⁿMPC, a parallel-in-horizon and construction-free batched GPU MPC solver that operates directly on time-varying dynamics to avoid high memory usage and pre-compilation. Through a variety of comparative studies and hardware validations, we have found that MPC-RL achieves superior performance in locomotion and manipulation skills.

System Architecture

Video Gallery

Humanoid Locomotion

Push Recovery

Cart Pushing Manipulation

Cart Pushing Manipulation

Payload Carrying

Payload Carrying

BibTeX

@article{li2026mpcrl,
  title={Accelerating and Scaling MPC-Guided Reinforcement Learning for Humanoid Locomotion and Manipulation},
  author={Li, Junheng and Wu, Liang and Esteban, Sergio A. and Yang, Lizhi and Drgo\v{n}a, J\'an and Ames, Aaron D.},
  journal={arXiv preprint arXiv:2606.05687},
  year={2026},
}