RT-2 is the new version of what the company calls its vision-language-action (VLA) model. The model teaches robots to better recognize visual and language patterns to interpret instructions and infer what objects work best for the request.
Researchers tested RT-2 with a robotic arm in a kitchen office setting, asking its robotic arm to decide what makes a good improvised hammer (it was a rock) and to choose a drink to give an exhausted person (a Red Bull). They also told the robot to move a Coke can to a picture of Taylor Swift. The robot is a Swiftie, and that is good news for humanity.
The new model trained on web and robotics data, leveraging research advances in large language models like Google’s own Bard and combining it with robotic data (like which joints to move), the company said in a paper. It also understands directions in languages other than English.
For years, researchers have tried to imbue robots with better inference to troubleshoot how to exist in a real-life environment. The Verge’s James Vincent pointed out real life is uncompromisingly messy. Robots need more instruction just to do something simple for humans. For example, cleaning up a spilled drink. Humans instinctively know what to do: pick up the glass, get something to sop up the mess, throw that out, and be careful next time.
Previously, teaching a robot took a long time. Researchers had to individually program directions. But with the power of VLA models like RT-2, robots can access a larger set of information to infer what to do next.
Google’s first foray into smarter robots started last year when it announced it would use its LLM PaLM in robotics, creating the awkwardly named PaLM-SayCan system to integrate LLM with physical robotics.
Google’s new robot isn’t perfect. The New York Times got to see a live demo of the robot and reported it incorrectly identified soda flavors and misidentified fruit as the color white.
Depending on the type of person you are, this news is either welcome or reminds you of the scary robot dogs from Black Mirror (influenced by Boston Dynamics robots). Either way, we should expect an even smarter robot next year. It might even clean up a spill with minimal instructions.
https://www.theverge.com/2023/7/28/23811109/google-smart-robot-generative-ai