Design Engineering

Researchers use video to train robots

By DE Staff   

Automation Machine Building Robotics

Carnegie Mellon researchers teach robots by having them watch people perform tasks.

A team from Carnegie Mellon University’s Robotics Institute used affordances to teach robots how to interact with objects.
(Photo credit: Carnegie Mellon University)

Researchers at Carnegie Mellon University has enabled robots to learn household chores by watching videos of people performing everyday tasks in their homes. According to the CMU team, two robots successfully learned 12 tasks including opening a drawer, oven door and lid; taking a pot off the stove; and picking up a telephone, vegetable or can of soup.

“The robot can learn where and how humans interact with different objects through watching videos,” said Deepak Pathak, an assistant professor in the Robotics Institute at CMU’s School of Computer Science. “From this knowledge, we can train a model that enables two robots to complete similar tasks in varied environments.”

In past research by Pathak and his students, robots were able to learn by watching humans in the same environment as the robot. Their latest work, called Vision-Robotics Bridge (VRB), eliminates the need for live human demonstrations or for the robot to perform in an identical environment to that in which it learned. According to the researchers, the robot still needs to practice a task, but can learn a new task in as little as 25 minutes.

To teach the robot how to interact with an object, the team applied the concept of affordances, a psychology concept that refers to what an environment offers, but expanded to include potential actions perceived by an individual.


For VRB, affordances define where and how a robot might interact with an object based on observed human behavior. As a robot watches a human open a drawer, for example, it identifies the the handle and the direction of the drawer’s movement. After watching several videos of humans opening drawers, the robot can determine how to open any drawer.

The team used videos from large datasets such as Ego4D and Epic Kitchens, which offer thousands of hours of daily activity videos, including cooking, cleaning and other kitchen tasks. Both datasets helped train the computer vision models.

“We are using these datasets in a new and different way,” said Shikhar Bahl, a Ph.D. student in robotics at CMU. “This work could enable robots to learn from the vast amount of internet and YouTube videos available.”


Stories continue below

Print this page

Related Stories