Researchers at the University of Tokyo and Alternative Machine have collaborated to develop a new humanoid robot system called Alter3. This robot is unique because it can understand and execute commands given in natural language, thanks to the powerful language model GPT-4. Alter3 is designed to perform various tasks, from taking selfies to mimicking behaviors like pretending to be a ghost.
The integration of large language models (LLMs) with robotics systems is a significant advancement in the field of robotics. Although commercial scalability is still a work in progress, the combination of LLMs and robots has shown great potential and is driving innovation in robotics research.
Alter3 utilizes GPT-4 as its backend model to interpret natural language instructions and translate them into actionable commands for the robot. The model uses an “agentic framework” to plan a sequence of actions that the robot must follow to accomplish the given task. This planning process involves determining the necessary steps to achieve the desired outcome.
To ensure precise execution of commands, Alter3 uses different GPT-4 prompt formats to analyze instructions and convert them into robot commands. The model generates a series of API commands based on the action plan, which are then executed by the robot. Additionally, researchers have incorporated a feedback mechanism that allows humans to provide corrections or adjustments to the robot’s actions, enhancing its performance.
The researchers conducted various tests to evaluate Alter3’s capabilities, including tasks like taking selfies, drinking tea, and mimicking behaviors such as pretending to be a ghost. Alter3 demonstrated the ability to respond to complex scenarios that require strategic planning and execution.
The incorporation of human feedback and memory storage has improved Alter3’s performance, enabling the robot to adapt and refine its actions based on corrections provided by users. This dynamic learning process enhances the robot’s ability to mimic human behaviors and emotions accurately.
The use of foundation models like GPT-4 in robotics research is gaining popularity, as seen in projects like Figure, which leverage OpenAI models to understand human instructions and perform real-world actions. While more advanced models are being developed for specific robotics tasks, the fundamental challenges of creating robots that can perform basic functions like grasping objects and maintaining balance remain.
In conclusion, Alter3 represents a significant milestone in robotics research, showcasing the potential of integrating language models with humanoid robots. As technology continues to advance, the collaboration between AI and robotics will lead to more sophisticated and capable robotic systems in the future.