Gemini, known for its versatile capabilities like generating text, poetry, code, and images, has extended its reach into the realm of robotics through Gemini Robotics. The latest innovation, Gemini Robotics On-Device model, is a game-changer as it eliminates the dependency on network connections. This development comes with a full software development kit (SDK) that empowers robotic engineers to train robots for various tasks.
Google introduced Gemini Robotics in March to bridge the gap between Gemini’s multimodal reasoning abilities and the physical world, enabling robots of varying shapes and sizes to execute real-world tasks efficiently. The initiative brought forth two new AI models. The first model, Gemini Robotics, is an advanced vision-language-action (VLA) model derived from Gemini 2.0, integrating physical actions to directly control robots. The second model, Gemini Robotics-ER, leverages Gemini’s spatial understanding capabilities, allowing roboticists to harness its embodied reasoning abilities.
Earlier versions of these models required robust computing systems linked to remote data centers, hampering accessibility for robots lacking continuous internet connectivity or operating in scenarios intolerant of network latency delays. The introduction of Gemini Robotics On-Device has addressed these limitations. This foundation model, tailored for bi-arm robots, is resource-efficient and capable of local operation with minimal latency. It exhibits exceptional performance across various tasks, such as following natural language instructions and performing intricate actions like folding clothes or manipulating objects, all without relying on external servers.
Gemini Robotics On-Device boasts adaptability not only to new tasks but also to different types of robots. While initially trained on ALOHA robots, the model has successfully been fine-tuned to operate other systems like the dual-arm Franka FR3 and the Apollo humanoid by Apptronik. Despite its compact design, the On-Device model showcases remarkable generalization and responsiveness, demonstrating its efficacy in diverse real-world applications.
Furthermore, DeepMind has introduced the Gemini Robotics SDK, enabling developers to evaluate and customize the model for specific requirements using the MuJoCo physics engine. The SDK’s flexibility allows for rapid fine-tuning with minimal demonstration examples, underscoring the model’s adaptability to evolving use cases. The integration of AI with robotics, exemplified by advancements like Gemini Robotics On-Device, heralds a new era where robots can fulfill a myriad of tasks, paving the way for a future where intelligent machines play a more significant role in various industries.
