MIT recently revealed a new approach to training robots that is inspired by language models. Instead of using limited data to teach robots new skills, this method leverages vast amounts of information similar to how large language models are trained.
Imitation learning, where robots learn by observing a human performing a task, can sometimes fail when faced with unexpected challenges like changes in lighting, new obstacles, or different environments. In such cases, robots lack the necessary data to adapt effectively.
To address this issue, the research team turned to models like GPT-4 and adopted a brute force data approach to problem-solving. Lead author Lirui Wang explained that while language data consist of sentences, the heterogeneous nature of robotic data requires a different architecture for pretraining, leading to the development of Heterogeneous Pretrained Transformers (HPT).
By integrating data from various sensors and environments and using transformers to train models, the researchers found that the size of the transformer directly impacts the quality of the output. Users can input the robot’s design, configuration, and the desired task to be performed.
The ultimate goal of this research is to create a universal robot brain that can be easily downloaded and utilized without the need for extensive training. CMU associate professor David Held expressed optimism about the potential of this approach, emphasizing the importance of continuous innovation and scalability in advancing robotic policies.
Funded in part by Toyota Research Institute, this research builds upon TRI’s previous work in robot training methods. The recent partnership between TRI and Boston Dynamics signifies a significant step forward in combining robot learning research with advanced robotic hardware, paving the way for groundbreaking advancements in the field of robotics.