
New Algorithm Teaches Robots Through Human Perspective
A new data creation paradigm and algorithmic breakthrough from Georgia Tech has laid the groundwork for humanoid assistive robots to help with laundry, dishwashing, and other household chores. The framework enables these robots to learn new skills by mimicking actions from first-person videos of everyday activities.
Current training methods limit robots from being produced at the necessary scale to put a robot in every home, said Simar Kareer, a Ph.D. student in the School of Interactive Computing.
“Traditionally, collecting data for robotics means creating demonstration data,” Kareer said. “You operate the robot’s joints with a controller to move it and achieve the task you want, and you do this hundreds of times while recording sensor data, then train your models. This is slow and difficult. The only way to break that cycle is to detach the data collection from the robot itself.”
Other fields, such as computer vision and natural language processing (NLP), already leverage training data passively culled from the internet to create powerful generative AI and large-language models (LLMs).
Many roboticists, however, have shifted toward interventions that allow individual users to teach their robots how to perform tasks. Kareer believes a similar source of passive data can be established to enable practical generalized training that scales the production of humanoid robots.
This is why Kareer collaborated with School of IC Assistant Professor Danfei Xu and his Robot Learning and Reasoning Lab to develop EgoMimic, an algorithmic framework that leverages data from egocentric videos.
[RELATED: New Video Dataset Aims to Teach Human Skills to AI Agents]
Meta’s Ego4D dataset inspired Kareer’s project. The benchmark dataset, released in 2023, consists of first-person videos of humans performing daily activities. This open-source data set trains AI models from a first-person human perspective.
“When I looked at Ego4D, I saw a dataset that’s the same as all the large robot datasets we’re trying to collect, except it’s with humans,” Kareer said. “You just wear a pair of glasses, and you go do things. It doesn’t need to come from the robot. It should come from something more scalable and passively generated, which is us.”
Kareer acquired a pair of Meta’s Project Aria research glasses, which contain a rich sensor suite and can record video from a first-person perspective through external RGB and SLAM cameras.
Kareer recorded himself folding a shirt while wearing the glasses and repeated the process. He did the same with other tasks such as placing a toy in a bowl and groceries into a bag. Then, he constructed a humanoid robot with pincers for hands and attached the glasses to the top to mimic a first-person viewpoint.
The robot performed each task repeatedly for two hours. Kareer said building a traditional training algorithm would take days of teleoperating and recording robot sensory data. For his project, he only needed to gather a baseline of sensory data to ensure performance improvement.
Kareer bridged the gap between the two training sets with the EgoMimic algorithm. The robot’s task performance rating increased by as much as 400% among various tasks with just 90 minutes of recorded footage. It also showed the ability to perform these tasks in unseen environments.
If enough people wear Aria glasses or other smart glasses while performing daily tasks, it can create the passive data bank needed to train robots on a massive scale.
This type of data collection can enable nearly endless possibilities for roboticists to help humans achieve more in their everyday lives. Humanoid robots can be produced and trained at an industrial level and be able to perform tasks the same way humans do.
“This work is most applicable to jobs that you can get a humanoid robot to do,” Kareer said. “In whatever industry we are allowed to collect egocentric data, we can develop humanoid robots.”
Kareer will present his paper on EgoMimic at the 2025 IEEE Engineers’ International Conference on Robotics and Automation (ICRA), which will take place from May 19 to 23 in Atlanta. The paper was co-authored by Xu and School of IC Assistant Professor Judy Hoffman, fellow Tech students Dhruv Patel, Ryan Punamiya, Pranay Mathur, and Shuo Cheng, and Chen Wang, a Ph.D. student at Stanford.
As computing revolutionizes research in science and engineering disciplines and drives industry innovation, Georgia Tech leads the way, ranking as a top-tier destination for undergraduate computer science (CS) education. Read more about the college's commitment:… https://t.co/9e5udNwuuD pic.twitter.com/MZ6KU9gpF3
— Georgia Tech Computing (@gtcomputing) September 24, 2024