Notion Blog by Xiaoyang Li, Weixun Wang and Yancheng He | Project Leader: Weixun Wang | March 5, 2026
Chinese Version: Save, Load and Learn: Boosting Agentic LLMs via Rollback-based Curriculum Learning (中文版)
<aside>
📄 Technical Report: https://arxiv.org/pdf/2512.24873
🧠 Model: https://huggingface.co/FutureLivingLab/iFlow-ROME
🧩 Framework:
📊 Benchmarks: https://github.com/alibaba/terminal-bench-pro
</aside>
In our technical report Let It Flow: Agentic Crafting on Rock and Roll, we introduced Rollback-based Curriculum Learning (referred to as Chunk-Level Initialized Resampling in the report) to address the challenges of agentic training on long-horizon, extremely difficult tasks. Due to space constraints, we only covered part of the method in the technical report, leaving many design details and practical considerations unexplored.
In this blog, we present the framework of Rollback-based Curriculum Learning (hereafter referred to as Rollback) holistically, including its core algorithm, motivation, key implementation choices, and several practical variants that make it work in real agentic environments. Along the way, we will answer three questions:
<aside>
If you’re interested in other parts of our agentic training pipeline, please refer to our technical report and the blog:
Report: https://arxiv.org/pdf/2512.24873
Blog: The Bitter Lesson Behind Building Agentic RL in Terminal Environments
</aside>
Rollback is a curriculum-learning framework for long-horizon agentic tasks. Starting from a verified successful trajectory, it forms a temporal curriculum by treating intermediate states as checkpoints. Training begins from checkpoints near the end and progressively rolls back to earlier ones as the model improves, until the agent can reliably solve the task end-to-end from the original initial state.

Illustration of the Rollback Algorithm. (Generated by Gemini)