Google says Gemini AI is making its robots smarter

Google is training its robots with Gemini AI to better navigate and complete tasks. The DeepMind robotics team explained in a new research paper how using Gemini 1.5 Pro’s long context window — which determines how much information an AI model can process — allows users to more easily interact with its RT-2 robots using natural language instructions.

This works by filming a video tour of a designated area, such as a home or office space, with researchers using Gemini 1.5 Pro to have the robot “watch” the video to learn more about its surroundings. The robot can then execute commands based on what it has observed using verbal and/or visual output, such as directing users to an electrical outlet after spotting a phone and asking “where can I charge this?” DeepMind says its Gemini-powered robot had a 90 percent success rate with more than 50 user instructions given across a work area of ​​more than 9,000 square feet.

Researchers also found “preliminary evidence” that Gemini 1.5 Pro enabled its droids to plan how to carry out instructions beyond just navigation. For example, when a user with a lot of cans of Coke on their desk asks the droid if their favorite drink is available, the team said Gemini “knows that the robot should navigate to the refrigerator, check for Cokes, and then return to the user to report the result.” DeepMind says it plans to investigate these results further.

Google’s video demonstrations are impressive, though the stark cuts after the droid acknowledges each request hide the fact that it takes anywhere from 10 to 30 seconds to process these instructions, according to the research paper. It may be a while before we share our homes with more advanced mapping robots, but at least these robots can find our missing keys or wallets.

Leave a Comment