← Back to posts
Humanoids

Figure's Helix-02 Demo: Two Humanoids Reset a Bedroom Without a Shared Brain

The setup: Two Figure humanoids walk into a minimalist bedroom. They hang a coat, close a laptop, put headphones away, dispose of trash, reposition furniture, and make a bed together. They finish in under two minutes. There is no human in the loop, no central controller orchestrating them, and — the interesting part — no direct communication between the two robots. Each one is figuring out what the other is doing by watching it through onboard cameras and inferring intent from movement. The full clip is on Figure’s YouTube channel, and the writeup is in Interesting Engineering’s coverage by Jijo Malayil.


What’s new about this isn’t the bed

Plenty of humanoid demos in the last two years have shown a single robot doing increasingly impressive things — folding laundry, sorting groceries, walking up stairs without falling on its face. This one is different in three specific ways, and only one of them is really visible in the video.

1. The two robots coordinate without talking to each other

This is the big one. In a traditional multi-robot system — the kind you’d see in a warehouse fulfillment center or a fleet of autonomous mobile robots — the robots have a shared planner. A central service hands out tasks, computes coordination, sequences moves, and pushes those decisions to each robot over a network. Even decentralized swarms usually have some form of explicit message-passing: position broadcasts, intent declarations, lock-step protocols.

Figure’s setup deliberately doesn’t have any of that. Each robot runs its own onboard policy off its own cameras. When one humanoid grabs a corner of the comforter, the other doesn’t get a message that says “I am pulling east, please pull west to maintain tension.” It watches and infers. The mental model is closer to how two humans who don’t speak the same language can make a bed together: one of them picks up a corner, the other one figures out from the movement what’s needed on their end and acts accordingly.

For multi-robot architecture, this is a real choice with consequences. The upside is that it scales without a network — no shared planner is one fewer single point of failure, one fewer protocol to design, one fewer thing to break when WiFi flakes out. The downside is that it puts a much heavier burden on each robot’s perception model: it has to understand the intent behind another agent’s motion in real time, which is closer to a theory-of-mind problem than an AS routing one. The fact that it apparently works on something as physically subtle as a comforter lift is the interesting datapoint.

2. The comforter

Most household-robotics demos so far have stuck to rigid objects — boxes, cans, dishes — because rigid objects have stable geometry and predictable grasp points. The robot can plan: “I will close my gripper on these coordinates in this orientation, and the object will be where I expect.”

A comforter has none of that. Pull on one corner and the whole shape changes. The grasp points move. The fabric folds, stretches, slips. For two robots cooperating on the same comforter, every motion one makes alters the environment for the other. Their plans have to update continuously, and they have to do it without the explicit “I’m going to do X next” message that would make it easy.

This is the kind of capability that doesn’t sound impressive in a one-line summary (“they made a bed”) but represents an order-of-magnitude harder problem than most household demos. Deformable-object manipulation has been an open problem in robotics research for decades.

3. Sim-to-real transfer without fine-tuning

Buried in the Interesting Engineering writeup is a sentence worth lingering on: Figure says the locomotion behaviors were trained end-to-end via reinforcement learning in simulation across heavily randomized terrains, and the learned policy transferred directly to the physical robots — no calibration step, no fine-tuning, no per-robot adjustments.

If true, that’s a big deal. “Sim-to-real” is one of robotics’ classic hard problems. The simulator never models the real world perfectly: friction, motor response curves, sensor noise, joint slop, lighting, cable hysteresis — they all differ between sim and reality, and most systems trained in sim need some amount of real-world fine-tuning before they’re usable. Eliminating that step (or at least claiming to) means you can train a fleet’s worth of behaviors in a render farm and ship them as-is. That’s the same scaling story you saw in language models when training-compute became the bottleneck instead of data collection.

I’d want to see independent confirmation of “no fine-tuning at all” — companies generally aren’t lying about this kind of thing, but the definition of “fine-tuning” can be elastic. Either way, the direction is what matters: making sim training transfer cleanly enough that the fleet doesn’t need a human PhD per robot.


The production side, briefly

The other Figure announcement worth noting: at their BotQ facility in California, they’re scaling Figure 03 production from one robot per day to one per hour, over a four-month ramp. That’s a roughly 24× throughput jump.

The “production capacity” story tends to get less coverage than the “look what this one robot can do” videos, but it’s the thing that actually determines whether humanoids show up in your warehouse or your hotel housekeeping closet in the next few years. Capability without throughput is a science project. Throughput without capability is a vacuum cleaner. Both at once, sustained, is what changes the world.


What this isn’t

A few honest framings, because demo videos are demo videos:

  • The bedroom was minimalist. A staged environment with few obstacles, predictable lighting, and curated objects. The robots aren’t doing the same task in a kid’s bedroom with toys, cables, laundry on the floor, and a cat. Not yet.
  • We don’t see the failure rate. Figure picked the takes that worked. We don’t know how often a comforter grab slips, how often a head-nod gets misread, how often the whole sequence has to restart.
  • “Under two minutes” is fast for a bed, slow for everything else. A skilled human makes a bed in 45 seconds. The robots are deliberate and a little slow because they’re being careful. This is what good engineering looks like at this stage — speed comes later, usually.
  • No autonomy without supervision yet. Even in this video, the assumption is that a human set up the task, monitored it, and would have intervened if something went wrong. Full unsupervised deployment is a different problem.

None of these are reasons to dismiss what was shown. They’re just reasons to keep the bar where it actually is.


What to watch next

A few specific things worth tracking from here:

  • Whether the coordination-without-comms approach generalizes. Bed-making is one scenario. Does the same observe-and-infer architecture hold up when two humanoids have to cooperate on something with less visual overlap — assembling furniture in different rooms, or one carrying while the other navigates?
  • How quickly Helix-02 (or whatever’s next) absorbs new task types. The model was reportedly trained on logistics, laundry, and home tasks. The fact that bedroom reset emerged suggests the underlying VLA architecture can compose skills it wasn’t explicitly trained for. That’s the more interesting claim, and it’s the one that should make AI researchers pay attention regardless of their interest in humanoids specifically.
  • Real-world deployment data. Demos are demos. The interesting moment is when a Figure 03 is doing this in a hotel or a hospital or a customer’s facility for eight hours a day, and someone publishes the failure logs.
  • Whether other humanoid programs (Tesla Optimus, Apptronik Apollo, 1X NEO, Agility Digit, Unitree H1/H2, Boston Dynamics Atlas-electric) converge on the same coordination-without-comms architecture, or go a different route. The field is small enough that this kind of design choice tends to spread fast if it works.

Why this section exists on TravTeks

I’ve been writing about networking, security, and automation here, and humanoid robotics keeps showing up at the edges of all three. There’s a network in every fleet of robots. There’s a security model in every Vision-Language-Action system that decides what objects are valid grasp targets. And the whole point of these robots is automation — just automation of physical work instead of pipelines and packets.

So this is the start of a separate category here for the humanoid robotics stuff specifically — not exhaustive coverage, just the demos and announcements and architecture choices that look like they actually matter. Figure’s bedroom video is a good place to begin.


Source: “US humanoid robots team up to clean bedroom, make bed in under 2 mins” — Interesting Engineering, May 11, 2026, by Jijo Malayil. Video on Figure’s YouTube channel.

// Found this useful? Share it or start a conversation.