Isaac GR00T N1.6: Advancing Humanoid Robots with Nvidia's Vision-Language-Action Model -

Nvidia has introduced Isaac GR00T N1.6, an advanced Vision-Language-Action (VLA) model aimed at enhancing humanoid robots’ capabilities in perceiving, comprehending language, reasoning, and executing tasks in intricate environments. Revealed at CES 2026, GR00T N1.6 represents Nvidia’s progression towards “physical AI,” enabling robots not only to observe their surroundings but also to acquire generalized skills and carry out coordinated actions. This model expands Nvidia’s open robotic ecosystem, integrating perception, reasoning, and complete body control for real-world operations.

Isaac GR00T N1.6 functions as an open VLA model designed to serve as a foundational model for humanoid robots. By amalgamating various inputs like visual data and natural language directives, it generates continuous control actions that guide robot movements and task performance. As a VLA model, GR00T N1.6 equips robots to comprehend their environment and execute tasks contextually, with adaptability and generalization across different settings.

Unlike traditional robotics systems that depend on pre-programmed actions for specific tasks, GR00T N1.6 introduces flexible, generalized behaviors by offering cross-embodiment adaptability, multimodal understanding, general task learning, and full-body control. These features represent a significant advancement towards creating robots capable of autonomously reasoning, planning, and acting in dynamic human environments.

Operating on a hybrid architecture combining a vision-language foundation model, an action generation module, and integration mechanisms, GR00T N1.6 seamlessly translates perceptions and instructions into executable control signals. The model is trained on vast datasets containing humanoid and bimanual robot trajectories, semi-humanoid data, and synthetic data, enabling it to generalize across diverse tasks.

Already finding applications across various industries, GR00T N1.6 facilitates tasks such as testing advanced behaviors on humanoid research platforms, training and validating robotics policies in simulation-to-real workflows, enabling natural language interactions and physical manipulations in enterprise robotics solutions, and fostering robotics experimentation and custom task learning in education and research.

Despite its advancements, real-world deployment of GR00T N1.6 requires considerations like robust hardware integration, ensuring safety and ethics in human environments, and collecting high-quality multimodal data for fine-tuning. Nvidia envisions its robotics stack, including Cosmos Reason, simulation tools, and Jetson robotics hardware, as a comprehensive ecosystem for “physical AI,” where robots can perceive, reason, and interact autonomously.

The broader trend in robotics is shifting towards integrating large multimodal foundation models with control systems, moving from task-specific programming to generalized autonomous behaviors. Models like GR00T N1.6 represent a pioneering step in this transformation, enabling humanoid robots to evolve into intelligent, adaptable entities capable of operating alongside humans in diverse environments.