One of the most time-consuming tasks on many automation wish lists is household chores.
The goal for many roboticists is to create the right combination of hardware and software so that a machine can learn « generalist » policies (the rules and strategies that guide the robot’s behavior) that work everywhere, under all conditions. However, if you own a domestic robot, you probably don’t care much about its efficiency for your neighbors. Researchers at MIT’s Computer Science and Artificial Intelligence Laboratory (CSAIL) have decided to find a solution that allows for the easy training of robust robot policies for very specific environments.
« We aim for robots to perform exceptionally well under disturbances, distractions, varying lighting conditions, and changes in object poses, all within a single environment, » says Marcel Torne Villasevil, a research assistant at MIT CSAIL’s Improbable AI lab and the lead author of a recent paper on the work. « We propose a method to create digital twins on-the-fly using the latest advances in computer vision. With just a phone, anyone can capture a digital replica of the real world, and robots can train in a simulated environment much faster than in the real world, thanks to GPU parallelization. Our approach eliminates the need for extensive reward engineering by leveraging a few real-world demonstrations to kickstart the training process. »
Bringing Your Robot Home
RialTo, of course, is a bit more complex than a simple phone gesture and (boom!) a domestic robot at your service. It starts with using your device to scan the target environment using tools like NeRFStudio, ARCode, or Polycam. Once the scene is reconstructed, users can upload it to the RialTo interface to make detailed adjustments, add the necessary joints for robots, and more.
The refined scene is exported and introduced into the simulator. Here, the goal is to develop a policy based on actions and observations from the real world, such as picking up a cup from a counter. These real-world demonstrations are replicated in the simulation, providing valuable data for reinforcement learning. « This helps create a robust policy that works well both in simulation and in the real world. An improved algorithm using reinforcement learning helps guide this process, ensuring the policy is effective when applied outside the simulator, » explains Torne.
Tests have shown that RialTo created robust policies for a variety of tasks, whether in controlled lab environments or more unpredictable real-world settings, improving imitation learning by 67% with the same number of demonstrations. Tasks included opening a toaster, placing a book on a shelf, setting a plate on a shelf, placing a cup on a shelf, opening a drawer, and opening a cabinet. For each task, researchers tested the system’s performance under three increasing levels of difficulty: randomizing object poses, adding visual distractions, and applying physical disturbances during task execution. When paired with real-world data, the system outperformed traditional imitation learning methods, especially in situations with many visual distractions or physical disturbances.
« These experiments show that if we want to be very robust in a particular environment, the best idea is to leverage digital twins instead of trying to achieve robustness with large-scale data collection in various environments, » says Pulkit Agrawal, director of the Improbable AI Lab, associate professor of electrical engineering and computer science (EECS) at MIT, principal investigator at MIT CSAIL, and senior author of the work.
Regarding limitations, RialTo currently takes three days to be fully trained. To speed this up, the team mentions improving the underlying algorithms and using foundation models. Simulation training also has its limits, and it is currently challenging to effortlessly transfer a simulation to the real world and simulate deformable objects or liquids.
The Next Level
So, what’s the next step in RialTo’s journey? Building on previous efforts, scientists are working to maintain robustness against various disturbances while improving the model’s adaptability to new environments. « Our next effort is this approach of using pre-trained models, speeding up the learning process, minimizing human input, and achieving broader generalization capabilities, » explains Torne.
« We are incredibly excited about our concept of ‘on-the-fly’ robot programming, where robots can autonomously scan their environment and learn to solve specific tasks in simulation. Although our current method has limitations (e.g., requiring a few initial demonstrations by a human and significant computational time for training these policies (up to three days)), we see it as a significant step towards achieving ‘on-the-fly’ robot learning and deployment, » says Torne. « This approach brings us closer to a future where robots won’t need a pre-existing policy covering all scenarios. Instead, they can quickly learn new tasks without extensive real-world interaction. In my opinion, these advancements could accelerate the practical application of robotics much sooner than relying solely on a universal, all-encompassing policy. »
« To deploy robots in the real world, researchers traditionally rely on methods like imitation learning from expert data, which can be costly, or reinforcement learning, which can be dangerous, » says Zoey Chen, a PhD student in computer science at the University of Washington who was not involved in the paper. « RialTo directly addresses the real-world RL [robot learning] safety constraints and data-efficient constraints for data-driven learning methods, with its new real-simulator-real pipeline. This new pipeline ensures not only safe and robust training in simulation before real-world deployment but also significantly improves data collection efficiency. RialTo has the potential to greatly expand robot learning and enable robots to adapt much more effectively to complex real-world scenarios. »
« Simulation has shown impressive capabilities on real robots by providing cheap, even infinite, data for policy learning, » adds Marius Memmel, a PhD student in computer science at the University of Washington who was not involved in the work. « However, these methods are limited to a few specific scenarios, and building the corresponding simulations is costly and laborious. RialTo provides an easy-to-use tool to reconstruct real environments in minutes instead of hours. Moreover, it extensively uses collected demonstrations during policy learning, minimizing the burden on the operator and reducing the sim2real gap. RialTo demonstrates its robustness to object poses and disturbances, showing incredible real-world performance without requiring extensive simulator construction and data collection. »
Torne authored this article alongside senior authors Abhishek Gupta, assistant professor at the University of Washington, and Agrawal. Four other CSAIL members are also credited: Anthony Simeonov SM ’22, EECS PhD student, research assistant Zechu Li, undergraduate student April Chan, and Tao Chen PhD ’24. Members of the Improbable AI Lab and WEIRD Lab also provided valuable feedback and support for the development of this project.
This work was supported in part by the Sony Research Award, the U.S. government, and Hyundai Motor Co., with assistance from the WEIRD (Washington Embodied Intelligence and Robotics Development) Lab. The researchers presented their work at the Robotics Science and Systems (RSS) conference earlier this month.