2026-05-03
Physical AI roundup — humanoid foundation models in 2026 Q2
Four humanoid foundation models shipped real-world demos in 2026 Q2: NVIDIA GR00T N2, Tesla Optimus Gen 3, Figure 03, and Physical Intelligence π0.5. The sim-to-real gap is closing — but only on dexterous tasks where teleoperation data is plentiful.
Humanoid robots are having their “ChatGPT moment” — but slower, messier, and bottlenecked by data. Here’s what shipped this quarter and what it actually means for builders.
The four releases that matter
1. NVIDIA GR00T N2 — generalist humanoid foundation model
GR00T N2 ships as a pretrained transformer that takes RGB + proprioception + language and outputs joint-space actions for any humanoid platform. The headline number is 70+ tasks zero-shot across 5 robot bodies, but the actually useful number is the fine-tune ratio: ~30 minutes of teleop data per new task vs ~8 hours for a from-scratch policy. Available via Isaac Lab and the Jetson Thor dev kit.
2. Tesla Optimus Gen 3 — vertical-integration thesis
Gen 3 dropped weight from 57 kg to 48 kg and added 22 DOF hands (vs 11 in Gen 2). The interesting bit isn’t the hardware — it’s that Tesla is now training Optimus on the same Dojo-trained vision-language stack that powers FSD V14. They’re betting that driving-data scale compounds into manipulation policies. Skeptics note that “look at the road” and “thread a screw” are very different action distributions.
3. Figure 03 — commercial deployment first
Figure 03 sacrifices DOF for reliability: 28 DOF total, but a 95%+ success rate on a fixed BMW-Spartanburg part-loading task across 10,000+ trials. The lesson: in 2026 Q2, factory floor adoption favors narrow-task reliability over generalist demos. Figure announced a 5-figure backlog with two German automakers.
4. Physical Intelligence π0.5 — the dataset moat
π0.5 (the half-step toward π1) is the open-weights surprise of the quarter. Trained on the Open X-Embodiment 2.0 dataset (1.2M trajectories, 35 robot embodiments), it matches GR00T N2 on benchmarks despite being ~6× smaller. The takeaway: data diversity is now beating parameter count for embodied policies.
What this means for builders
- Sim-to-real is closing on dexterous tasks but stuck on locomotion. Pick-and-place across novel objects: works. Walking on uneven terrain: still hand-tuned per platform.
- Teleoperation data is the new training corpus. ALOHA-2 rigs ($35K) are now the default lab setup. If you want to train a custom skill, plan for ~50 hours of teleop per task.
- Reasoning latency caps task complexity. GR00T N2 inference at 30 Hz on Jetson Thor is fine for manipulation but too slow for reactive obstacle avoidance. Hybrid stacks (fast low-level + slow VLA) are dominating.
- The deployment bottleneck is now safety certification, not capability. All four platforms above can do useful work today; getting them past ISO 10218 + ISO/TS 15066 is what’s gating revenue.
What to watch in Q3
- Open-weights GR00T variant (rumored)
- π1 release with action-chunking transformer architecture
- First public Optimus customer beyond internal Tesla factories
- Boston Dynamics Atlas commercial program (electric Atlas only, hydraulic version retired)