2026-05-03

Physical AI roundup — humanoid foundation models in 2026 Q2

Four humanoid foundation models shipped real-world demos in 2026 Q2: NVIDIA GR00T N2, Tesla Optimus Gen 3, Figure 03, and Physical Intelligence π0.5. The sim-to-real gap is closing — but only on dexterous tasks where teleoperation data is plentiful.

Humanoid robots are having their “ChatGPT moment” — but slower, messier, and bottlenecked by data. Here’s what shipped this quarter and what it actually means for builders.

The four releases that matter

1. NVIDIA GR00T N2 — generalist humanoid foundation model

GR00T N2 ships as a pretrained transformer that takes RGB + proprioception + language and outputs joint-space actions for any humanoid platform. The headline number is 70+ tasks zero-shot across 5 robot bodies, but the actually useful number is the fine-tune ratio: ~30 minutes of teleop data per new task vs ~8 hours for a from-scratch policy. Available via Isaac Lab and the Jetson Thor dev kit.

2. Tesla Optimus Gen 3 — vertical-integration thesis

Gen 3 dropped weight from 57 kg to 48 kg and added 22 DOF hands (vs 11 in Gen 2). The interesting bit isn’t the hardware — it’s that Tesla is now training Optimus on the same Dojo-trained vision-language stack that powers FSD V14. They’re betting that driving-data scale compounds into manipulation policies. Skeptics note that “look at the road” and “thread a screw” are very different action distributions.

3. Figure 03 — commercial deployment first

Figure 03 sacrifices DOF for reliability: 28 DOF total, but a 95%+ success rate on a fixed BMW-Spartanburg part-loading task across 10,000+ trials. The lesson: in 2026 Q2, factory floor adoption favors narrow-task reliability over generalist demos. Figure announced a 5-figure backlog with two German automakers.

4. Physical Intelligence π0.5 — the dataset moat

π0.5 (the half-step toward π1) is the open-weights surprise of the quarter. Trained on the Open X-Embodiment 2.0 dataset (1.2M trajectories, 35 robot embodiments), it matches GR00T N2 on benchmarks despite being ~6× smaller. The takeaway: data diversity is now beating parameter count for embodied policies.

What this means for builders

Sim-to-real is closing on dexterous tasks but stuck on locomotion. Pick-and-place across novel objects: works. Walking on uneven terrain: still hand-tuned per platform.
Teleoperation data is the new training corpus. ALOHA-2 rigs ($35K) are now the default lab setup. If you want to train a custom skill, plan for ~50 hours of teleop per task.
Reasoning latency caps task complexity. GR00T N2 inference at 30 Hz on Jetson Thor is fine for manipulation but too slow for reactive obstacle avoidance. Hybrid stacks (fast low-level + slow VLA) are dominating.
The deployment bottleneck is now safety certification, not capability. All four platforms above can do useful work today; getting them past ISO 10218 + ISO/TS 15066 is what’s gating revenue.

What to watch in Q3

Open-weights GR00T variant (rumored)
π1 release with action-chunking transformer architecture
First public Optimus customer beyond internal Tesla factories
Boston Dynamics Atlas commercial program (electric Atlas only, hydraulic version retired)