ACuTE: Automatic Curriculum Transfer from Simple to Complex Environments

Y Shukla, C Thierauf, R Hosseini, G Tatiya, J Sinapov. Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2022).

Reinforcement Learning problems can be reduced into simpler steps that improve learning. Can we automate that process?

This post summarizes work with a collaborator (Yash Shukla). The development and implementation of the ACuTE algorithm was all him and his collaborators in Prof. Sinapov’s lab; I did work on the robot deployment. Any correspondence about the algorithm should be directed to Yash. You can read the paper here.

This project came out of an ongoing challenge in reinforcement learning—how to make learning more efficient in environments that are expensive to simulate or interact with. Most real-world robotic tasks fall into this category. Training an agent from scratch in high-fidelity physics or the real world takes forever, and methods like domain randomization or system identification often introduce more engineering overhead than they save.

The core idea behind ACuTE (Automatic Curriculum Transfer from Simple to Complex Environments) is simple: don’t learn everything in the hard environment. Instead, generate a curriculum of tasks in a much simpler version of it, and then transfer that curriculum’s structure—not the policies themselves—to the high-fidelity setting. The robot can then learn its high-fidelity tasks in a sequence that’s already been optimized elsewhere. In other words, ACuTE doesn’t move weights or networks across domains—it transfers the schema of learning.

We defined this as a curriculum transfer problem: the process of learning a sequence of tasks in a low-fidelity (LF) environment and mapping them to their high-fidelity (HF) counterparts. The LF version could be a grid world, a simple simulator, or even an abstract symbolic model. The HF version might be a physics simulator like PyBullet or a real robot. The goal is to automatically produce a curriculum that can bootstrap learning in the HF setting, without the agent ever having to explore that complex world from scratch.

At its core, ACuTE generates, sequences, and transfers curricula through affine mappings of task parameters—so if a “task” in the simple environment means cutting down trees and crafting a tool, its HF version preserves the logic but adjusts spatial scale, object count, and sensor modalities. The agent never assumes shared action or observation spaces; it only needs consistent task semantics. That makes it far more flexible than standard policy or value transfer methods, which often fail when the two domains differ too much.

We tested ACuTE in a toy crafting world we called Crafter-TurtleBot. In the LF version, the agent moved in a grid world to collect resources and craft an item. In the HF version, those same resources were placed in a continuous PyBullet simulation, and later, the trained policy was transferred to an actual TurtleBot platform using fiducials for object detection. The transfer worked without further learning—showing that the sequence of skills learned in simulation carried over to the real robot as long as the overall structure of the curriculum was preserved.

The automated version of ACuTE used a beam search to optimize the task sequence, balancing exploration of new goals with efficiency. It consistently beat baselines like domain adaptation, self-play, and teacher-student curricula, both in learning speed and “jumpstart” performance—the immediate advantage at the beginning of training. Even when we deliberately added noise to the mappings between LF and HF tasks, the system still outperformed the alternatives. That robustness is critical, because in practice, your mappings will never be perfect.

We also showed that the method doesn’t depend on a particular reinforcement learning algorithm. We used a simple policy-gradient approach for curriculum generation, but when we switched to PPO or DQN in the HF environment, it still learned faster through the transferred curriculum. That independence makes ACuTE more of a general design principle than a single model.

Finally, we deployed it on a real TurtleBot. The robot used a depth camera and LIDAR to navigate to fiducial-marked objects, performing break and craft actions just like in simulation. Nothing was retrained; it was a pure Sim2Real transfer. The result wasn’t flashy, but it proved that the curriculum schema itself—when learned efficiently in a simpler environment—can bridge the simulation gap without the cost of full-scale domain adaptation.

In the end, ACuTE was about finding a middle ground between simplicity and realism. Instead of endlessly tuning simulations to match the physical world, we asked: what if the robot’s path to competence could be learned elsewhere, then mapped into place? That question opened up a lot of room for future work—especially scaling to multi-agent systems, or automating the environment abstraction process itself. The key insight still holds: the order in which a robot learns matters, and that order doesn’t have to be discovered in the hardest possible world.