Notion Blog by Xiaoyang Li, Weixun Wang and Yancheng He | Project Leader: Weixun Wang | March 5, 2026

Chinese Version: Save, Load and Learn: Boosting Agentic LLMs via Rollback-based Curriculum Learning (中文版)

<aside>

📄 Technical Report: https://arxiv.org/pdf/2512.24873

🧠 Model: https://huggingface.co/FutureLivingLab/iFlow-ROME

🧩 Framework:

Distributed RL Training Infrastructure: https://github.com/alibaba/ROLL
Sandbox Management & Orchestration Service: https://github.com/alibaba/ROCK
RL-Native Environment Command-Line Tool: https://github.com/iflow-ai/iflow-cli

📊 Benchmarks: https://github.com/alibaba/terminal-bench-pro

</aside>

1. Introduction

In our technical report Let It Flow: Agentic Crafting on Rock and Roll, we introduced Rollback-based Curriculum Learning (referred to as Chunk-Level Initialized Resampling in the report) to address the challenges of agentic training on long-horizon, extremely difficult tasks. Due to space constraints, we only covered part of the method in the technical report, leaving many design details and practical considerations unexplored.

In this blog, we present the framework of Rollback-based Curriculum Learning (hereafter referred to as Rollback) holistically, including its core algorithm, motivation, key implementation choices, and several practical variants that make it work in real agentic environments. Along the way, we will answer three questions:

What is Rollback?
Why does Rollback matter?
How can we apply Rollback in practice?

<aside>

If you’re interested in other parts of our agentic training pipeline, please refer to our technical report and the blog:

Report: https://arxiv.org/pdf/2512.24873

Blog: The Bitter Lesson Behind Building Agentic RL in Terminal Environments

</aside>

2. Curriculum Learning with Rollback (What)

2.1 Definitions

Rollback is a curriculum-learning framework for long-horizon agentic tasks. Starting from a verified successful trajectory, it forms a temporal curriculum by treating intermediate states as checkpoints. Training begins from checkpoints near the end and progressively rolls back to earlier ones as the model improves, until the agent can reliably solve the task end-to-end from the original initial state.

Illustration of the Rollback Algorithm. (Generated by Gemini)

1. Introduction

2. Curriculum Learning with Rollback (What)

2.1 Definitions

2.2 Core Algorithm