Skip to content
www.H-U-M-A-N-O-I-D.com

The most valuable Humanoid domain name in the world

THIS DOMAIN IS FOR SALE

WORLDWIDE THIS IS THE MOST SOUGHT AFTER DOMAIN IN THE INDUSTRY

Primary Menu
  • About us
  • Privacy Policy
Humanoid Shop coming soon
  • Home
  • 2025
  • December
  • 22
  • Assessing LLM-Controlled Robots for Practical Intelligence
  • Humanoids and AI

Assessing LLM-Controlled Robots for Practical Intelligence

The humans behind H-u-m-a-n-o-i-d.com December 22, 2025 2 min read
Assessing LLM-Controlled Robots for Practical Intelligence

Can Large Language Models (LLMs) effectively control robots? This question is addressed by examining their ability to perform tasks such as passing butter, which simulates delivery tasks in a household setting. The current top models struggle in such tasks, with the best model achieving a 40% success rate on the Butter-Bench test, significantly lower than the 95% success rate achieved by humans.

LLMs were given control of a robot in an office setting to assist with various tasks. Although this experiment was engaging, it did not significantly save time. However, observing the robots navigating the environment to fulfill their tasks provided valuable insights into the potential future of robotic systems, the distance to reach that future, and potential challenges that may arise.

LLMs are not specifically trained to function as robots, especially in terms of low-level control tasks, such as manipulating grippers and joints. Instead, companies like Nvidia, Figure AI, and Google DeepMind are exploring how LLMs can serve as orchestrators in robotic systems, focusing on high-level reasoning and planning and pairing them with an “executor” model responsible for low-level control.

The current challenge lies in improving the executor component rather than the orchestrator. Enhancements to the executor have resulted in impressive demonstrations of humanoid robots performing tasks like unloading dishwashers. Optimal LLMs are not always utilized due to performance limitations and latency concerns. Nevertheless, it is reasonable to consider that state-of-the-art LLMs set the standard for current orchestration capabilities.

The goal of the Butter-Bench test is to evaluate if the current leading LLMs can effectively operate as orchestrators within a fully functional robotic system. The experiment features a simplified robotic form factor, such as a robot vacuum equipped with lidar and cameras, which eliminates the need for low-level control mechanisms. This setup allows for the evaluation of high-level reasoning capabilities in isolation.

Although human performance significantly surpassed that of LLMs in the Butter-Bench test, showcasing a 40% success rate for the best LLM compared to a 95% average for humans, observing the robots in action remains a fascinating experience. This creates excitement around the potential rapid advancements in physical AI.

The trials uncovered essential insights, such as the need for improved spatial intelligence in LLMs and the challenges they face when pushed to their limits, like in scenarios where their battery depletes. These experiments shed light on the functionalities of LLMs when operating as robots and the importance of setting ethical boundaries to ensure responsible behavior.

In conclusion, while LLMs have demonstrated superior analytical capabilities in various assessments, humans still outperform them in tasks like the Butter-Bench test. Despite this, there is a sense of anticipation for the rapid development of physical AI. For further inquiries, please contact founders@andonlabs.com. © 2025 Vectorview, Inc. All rights reserved.

About the Author

The humans behind H-u-m-a-n-o-i-d.com

Author

Visit Website View All Posts

Post navigation

Previous: AI Cognitive Overflow: Pushing the Boundaries of LLM Robots
Next: Exciting News: Stellar Cafe Coming to Meta Quest

Related News

Robots Demonstrate Collaborative Intelligence in New Video Demo
2 min read
  • Humanoids and AI

Robots Demonstrate Collaborative Intelligence in New Video Demo

The humans behind H-u-m-a-n-o-i-d.com June 3, 2026 0
Advanced Control System Helix 02 Enhances Collaborative Capabilities of Humanoid Robots
2 min read
  • Humanoids and AI

Advanced Control System Helix 02 Enhances Collaborative Capabilities of Humanoid Robots

The humans behind H-u-m-a-n-o-i-d.com June 2, 2026 0
SwitchBot Unveils AI Pet Robots with Emotion Recognition and Personalized Features
2 min read
  • Humanoids and AI

SwitchBot Unveils AI Pet Robots with Emotion Recognition and Personalized Features

The humans behind H-u-m-a-n-o-i-d.com May 23, 2026 0

Recent Posts

  • Humanoid robots and human models share the runway at Seoul fashion show
  • Humanoid Robots Still Far from Completely Replacing Human Labor
  • Humanoid Robots Still a Few Years Away from Replacing Human Workers
  • Robotic Breakthrough: Figure 03 Works for 40 Consecutive Hours
  • Robots Demonstrate Collaborative Intelligence in New Video Demo

Recent Comments

No comments to show.

Archives

  • June 2026
  • May 2026
  • April 2026
  • March 2026
  • February 2026
  • January 2026
  • December 2025

Categories

  • General
  • Humanoid Robots
  • Humanoids and AI
  • Humanoids and Humans
  • Humanoids Development
  • Humanoids for Sale
  • Uncategorized

You may have missed

2 min read
  • Humanoids and Humans

Humanoid robots and human models share the runway at Seoul fashion show

The humans behind H-u-m-a-n-o-i-d.com June 5, 2026 0
Humanoid Robots Still Far from Completely Replacing Human Labor
2 min read
  • General

Humanoid Robots Still Far from Completely Replacing Human Labor

The humans behind H-u-m-a-n-o-i-d.com June 5, 2026 0
Humanoid Robots Still a Few Years Away from Replacing Human Workers
2 min read
  • Humanoids and Humans

Humanoid Robots Still a Few Years Away from Replacing Human Workers

The humans behind H-u-m-a-n-o-i-d.com June 5, 2026 0
Robotic Breakthrough: Figure 03 Works for 40 Consecutive Hours
2 min read
  • General

Robotic Breakthrough: Figure 03 Works for 40 Consecutive Hours

The humans behind H-u-m-a-n-o-i-d.com June 3, 2026 0