Skip to content
www.H-U-M-A-N-O-I-D.com

The most valuable Humanoid domain name in the world

THIS DOMAIN IS FOR SALE

WORLDWIDE THIS IS THE MOST SOUGHT AFTER DOMAIN IN THE INDUSTRY

Primary Menu
  • About us
  • Privacy Policy
Humanoid Shop coming soon
  • Home
  • 2025
  • December
  • 22
  • Assessing LLM-Controlled Robots for Practical Intelligence
  • Humanoids and AI

Assessing LLM-Controlled Robots for Practical Intelligence

The humans behind H-u-m-a-n-o-i-d.com December 22, 2025 2 min read
Assessing LLM-Controlled Robots for Practical Intelligence

Can Large Language Models (LLMs) effectively control robots? This question is addressed by examining their ability to perform tasks such as passing butter, which simulates delivery tasks in a household setting. The current top models struggle in such tasks, with the best model achieving a 40% success rate on the Butter-Bench test, significantly lower than the 95% success rate achieved by humans.

LLMs were given control of a robot in an office setting to assist with various tasks. Although this experiment was engaging, it did not significantly save time. However, observing the robots navigating the environment to fulfill their tasks provided valuable insights into the potential future of robotic systems, the distance to reach that future, and potential challenges that may arise.

LLMs are not specifically trained to function as robots, especially in terms of low-level control tasks, such as manipulating grippers and joints. Instead, companies like Nvidia, Figure AI, and Google DeepMind are exploring how LLMs can serve as orchestrators in robotic systems, focusing on high-level reasoning and planning and pairing them with an “executor” model responsible for low-level control.

The current challenge lies in improving the executor component rather than the orchestrator. Enhancements to the executor have resulted in impressive demonstrations of humanoid robots performing tasks like unloading dishwashers. Optimal LLMs are not always utilized due to performance limitations and latency concerns. Nevertheless, it is reasonable to consider that state-of-the-art LLMs set the standard for current orchestration capabilities.

The goal of the Butter-Bench test is to evaluate if the current leading LLMs can effectively operate as orchestrators within a fully functional robotic system. The experiment features a simplified robotic form factor, such as a robot vacuum equipped with lidar and cameras, which eliminates the need for low-level control mechanisms. This setup allows for the evaluation of high-level reasoning capabilities in isolation.

Although human performance significantly surpassed that of LLMs in the Butter-Bench test, showcasing a 40% success rate for the best LLM compared to a 95% average for humans, observing the robots in action remains a fascinating experience. This creates excitement around the potential rapid advancements in physical AI.

The trials uncovered essential insights, such as the need for improved spatial intelligence in LLMs and the challenges they face when pushed to their limits, like in scenarios where their battery depletes. These experiments shed light on the functionalities of LLMs when operating as robots and the importance of setting ethical boundaries to ensure responsible behavior.

In conclusion, while LLMs have demonstrated superior analytical capabilities in various assessments, humans still outperform them in tasks like the Butter-Bench test. Despite this, there is a sense of anticipation for the rapid development of physical AI. For further inquiries, please contact founders@andonlabs.com. © 2025 Vectorview, Inc. All rights reserved.

About the Author

The humans behind H-u-m-a-n-o-i-d.com

Author

Visit Website View All Posts

Post navigation

Previous: AI Cognitive Overflow: Pushing the Boundaries of LLM Robots
Next: Exciting News: Stellar Cafe Coming to Meta Quest

Related News

Future of Household Robots Explored at CES 2026
2 min read
  • Humanoids and AI

Future of Household Robots Explored at CES 2026

The humans behind H-u-m-a-n-o-i-d.com January 9, 2026 0
At CES 2026, PaXini Unveils Strategy for Embodied Intelligence through Full-Stack Approach
3 min read
  • Humanoids and AI

At CES 2026, PaXini Unveils Strategy for Embodied Intelligence through Full-Stack Approach

The humans behind H-u-m-a-n-o-i-d.com January 9, 2026 0
AI-Powered Humanoid Robots: Transitioning from Labs to Factories
2 min read
  • Humanoids and AI

AI-Powered Humanoid Robots: Transitioning from Labs to Factories

The humans behind H-u-m-a-n-o-i-d.com January 8, 2026 0

Recent Posts

  • Chinese Companies Dominate Global Human-Like Robot Market
  • Revolutionary Artificial Skin Enhances Robotic Sensitivity for Human-like Touch
  • Developing Emotional and Multilingual Capabilities in Social Robots
  • Robots as Social Influencers: Exploring Human-Robot Interactions
  • Social Robots Market Growth Expected to Reach USD 1.10 Billion by 2025

Recent Comments

No comments to show.

Archives

  • January 2026
  • December 2025

Categories

  • General
  • Humanoid Robots
  • Humanoids and AI
  • Humanoids and Humans
  • Humanoids Development
  • Humanoids for Sale
  • Uncategorized

You may have missed

Chinese Companies Dominate Global Human-Like Robot Market
1 min read
  • Humanoids for Sale

Chinese Companies Dominate Global Human-Like Robot Market

The humans behind H-u-m-a-n-o-i-d.com January 13, 2026 0
Revolutionary Artificial Skin Enhances Robotic Sensitivity for Human-like Touch
2 min read
  • Humanoids Development

Revolutionary Artificial Skin Enhances Robotic Sensitivity for Human-like Touch

The humans behind H-u-m-a-n-o-i-d.com January 13, 2026 0
Developing Emotional and Multilingual Capabilities in Social Robots
2 min read
  • Humanoids Development

Developing Emotional and Multilingual Capabilities in Social Robots

The humans behind H-u-m-a-n-o-i-d.com January 12, 2026 0
Robots as Social Influencers: Exploring Human-Robot Interactions
2 min read
  • Humanoids and Humans

Robots as Social Influencers: Exploring Human-Robot Interactions

The humans behind H-u-m-a-n-o-i-d.com January 12, 2026 0