Skip to content
AI-Daily-Builder

2026-06-17 views

Compute for Physical AI — The Silicon Powering the Robot Ramp (Mid-2026)

Benchmarking the inference and training chips powering autonomous vehicles and humanoid robots — Jetson Thor, HW4, Dojo, EyeQ Ultra — through mid-2026.

The compute layer is the quiet constraint on physical AI timelines

The robot ramp has a silicon floor. Autonomous vehicles need onboard inference chips that fuse radar, lidar, and camera data in real time under strict power budgets. Humanoid robots need edge processors that run foundation models in a chassis weighing under 70 kilograms, unplugged. Training those models requires massive cloud or private compute clusters capable of ingesting terabytes of robot demonstration data.

Every timeline projection for Waymo, Tesla, Figure AI, or any other physical AI company is, at its base, a projection about available silicon. This article benchmarks the chips driving the field — inference hardware deployed in vehicles and robots, and the training compute behind the models.

One definitional note on TOPS: Tera-Operations Per Second (TOPS) is measured at INT8 precision for inference workloads. Training chips use different metrics (TFLOPS at BF16 or FP16). The two are not directly comparable — training and inference are distinct workloads. Tables below separate them accordingly.


Section 1 — Master silicon benchmark table

The table below covers the primary inference chips relevant to physical AI as of mid-2026. All TOPS figures are INT8 unless noted. Power figures are typical operating draw, not peak TDP unless specified. “Available” indicates general commercial availability; some chips remain on allocation or in staged rollout.

ChipMakerTOPS (INT8)Power (W)Memory / BandwidthPrimary Use CaseCommercial Status
Jetson Orin NXNVIDIA10010–2516 GB LPDDR5, 102 GB/sEdge robotics, drone payloads, industrialGenerally available
Jetson ThorNVIDIA800~60128 GB/s (est.)Next-gen humanoid robots, advanced roboticsStaged / on allocation
HW4 (FSD Computer)Tesla1,000+ (Tesla claim)~50–80 per unitCustom LPDDR5Tesla vehicle autonomy inferenceIn production (Model S/X/3/Y/Cybertruck/Cybercab)
HW4 Dual-ChipTesla2,000+ (Tesla claim)~100–160 combinedTwo HW4 units in parallelHigh-redundancy Tesla vehiclesIn production
Dojo D1 tileTeslaN/A — training chip~350 per tile900 GB/s per tileNeural net training (not inference)Training cluster only
Snapdragon Ride EliteQualcomm700+Not fully disclosedLPDDR5 with automotive-grade ECCADAS, L2+/L3 AVIn production (OEM rollout)
EyeQ UltraMobileye176~10Integrated LPDDR5L4 autonomous driving inferenceAvailable since 2025
TPU v5eGoogleN/A — training chip~170 per chipHBM2e, 1.6 TB/s per podCloud model training (e.g., Waymo neural nets)Google Cloud (not sold standalone)

Reading the table: TOPS figures vary significantly by how each manufacturer measures them — Tesla’s 1,000+ claim uses Tesla’s internal benchmark methodology, which may not be directly comparable to NVIDIA’s published INT8 numbers. Treat cross-vendor TOPS comparisons as directional, not precise. Power efficiency (TOPS per watt) is often a more meaningful metric for mobile and vehicle applications: HW4 achieves roughly 12–20 TOPS/W (est.), EyeQ Ultra roughly 17 TOPS/W, and Jetson Orin NX roughly 4–10 TOPS/W depending on operating point.

Tesla Dojo D1: Each D1 tile delivers 362 TFLOPS at BF16. Tesla’s ExaPOD configuration — 3,000 D1 tiles plus switching fabric — targets 100 exaFLOPS of aggregate training compute. This is a training system, not an inference chip. It is not deployed in vehicles.


Section 2 — Who uses what: company-level compute stack

The inference chip in a vehicle or robot is only half the picture. Training compute — the cloud or private cluster used to build the model in the first place — is equally consequential. The table below maps the major physical AI companies to both layers.

CompanyOnboard Inference ChipTraining ComputeNotes
WaymoCustom ASIC (Waymo Driver chip, 5th gen)Google Cloud TPU v4/v5 clustersIn-vehicle chip details limited; Google Cloud relationship provides training scale
TeslaHW4 (single or dual)Dojo + NVIDIA H100 clusters (transitional)Active vertical integration — moving training toward Dojo; HW4 designed in-house
Figure AINVIDIA Jetson ThorNVIDIA DGX / H100 clustersFoundation model trained offboard; Thor handles onboard inference
Agility Robotics (Digit)Intel / NVIDIA edge compute (mixed)AWS cloud computeAmazon ownership provides AWS infrastructure; onboard chip details limited
1X TechnologiesNVIDIA Jetson Thor platformNVIDIA DGX-basedOpenAI partnership influences model training stack
Boston Dynamics (Atlas)Custom actuator compute + NVIDIA Isaac platformNVIDIA Isaac Sim / cloud trainingIsaac platform used for simulation-to-real transfer
Apptronik (Apollo)NVIDIA-based edge computeAWS / NVIDIA (est.)Google/Samsung investment; training stack not fully disclosed

What this table reveals: Tesla and Waymo have vertically integrated or deeply partnered on both inference and training. The humanoid startup cohort — Figure, 1X, Apptronik — converges on NVIDIA Jetson Thor for inference and NVIDIA DGX infrastructure for training. This creates a single-vendor dependency with supply implications discussed in Section 5.


Section 3 — Tesla’s vertical integration advantage

Tesla occupies a structurally different position from every other physical AI company in the compute layer. It designs both the inference chip deployed in its vehicles (HW4) and the training silicon used to build the models (Dojo D1). No other physical AI company controls both ends of this stack.

What vertical integration buys Tesla

No NVIDIA export restriction exposure on training. Dojo D1 tiles are designed and manufactured with US-based tooling and supply chains. When the US government restricts export of NVIDIA H100 and A100 chips to certain markets, Tesla’s training pipeline — built on Dojo — is unaffected. This is a strategic asymmetry that compounds over time as export controls evolve.

Cost per TOPS at vehicle scale. HW4 is manufactured and integrated as part of Tesla’s vehicle production line. The cost of the inference compute is amortized across the vehicle hardware margin. Buying Mobileye EyeQ Ultra or Qualcomm Snapdragon Ride Elite as a third-party component adds a vendor margin layer and creates a procurement dependency. Tesla eliminates both by designing and integrating in-house.

Training compute: Dojo vs. NVIDIA H100 cluster comparison. Tesla’s ExaPOD targets 100 exaFLOPS of BF16 training compute across 3,000 D1 tiles. A comparable NVIDIA H100 cluster at 100 exaFLOPS would require roughly 3,100 H100 GPUs (each delivering approximately 32 TFLOPS BF16 at SXM5 peak). At data center pricing (est.), an H100 cluster of that scale represents several hundred million dollars in hardware plus power and cooling infrastructure. Tesla’s Dojo is designed to deliver equivalent scale at lower total cost of ownership — though external validation of this cost claim is not publicly available.

The trade-off: Vertical integration carries engineering risk. Dojo development has taken longer than initial public timelines suggested, and Tesla has continued to use NVIDIA H100 clusters for training during the transition. The full shift to Dojo for primary training is a stated goal, not a completed transition as of mid-2026.


Section 4 — Waymo’s compute stack

Waymo’s approach is the inverse of Tesla’s in one important respect: it does not design its own chips, but it has deeply integrated access to the most powerful training infrastructure in the industry through its parent company, Alphabet/Google.

In-vehicle inference: the Waymo Driver chip

Waymo has developed a custom ASIC for onboard inference — the fifth-generation Waymo Driver chip. The detailed specifications of this chip are not publicly disclosed, which is consistent with Waymo’s practice of protecting technical differentiation. What is known from Waymo’s public communications:

Waymo does not sell or license its inference chip. It is purpose-built for the Waymo One vehicle and is not a general-purpose automotive chip.

Training: Google Cloud TPU scale advantage

Waymo trains its neural networks on Google Cloud TPU v4 and v5 infrastructure. Google Cloud’s TPU pod configurations reach exaFLOP-scale compute. This gives Waymo access to training compute that is comparable to or exceeds what any humanoid startup can provision through NVIDIA cloud instances, at a cost structure that reflects the Google parent relationship rather than market rates.

The structural implication: Waymo’s training scale advantage is not something a startup can replicate by raising another round. The access to Google TPU infrastructure at cost is a structural moat. The constraint for Waymo is not compute — it is data diversity (driving miles across more cities, conditions, and edge cases) and vehicle manufacturing scale.


Section 5 — The NVIDIA bottleneck for humanoid startups

The convergence of humanoid robot startups on a single inference platform — NVIDIA Jetson Thor — creates a supply concentration risk that is not widely discussed in coverage of the humanoid robot ramp.

Why Thor is the default choice

Jetson Thor offers the highest TOPS-per-watt ratio in its class for a commercially available humanoid-scale edge compute module. At 800 TOPS and approximately 60W, it enables onboard inference for large vision-language-action models without requiring an external compute tether. NVIDIA’s Isaac robotics platform — simulation, training pipeline, and deployment tooling — integrates natively with Jetson hardware. For a startup that wants to move fast without building its own silicon team, Thor plus Isaac is the rational choice.

The allocation problem

NVIDIA Jetson Thor is a high-complexity system-on-module that competes for NVIDIA’s internal engineering and manufacturing capacity alongside data center GPU demand. As of mid-2026, Jetson Thor is reported to be on allocation — meaning demand from humanoid robot manufacturers exceeds immediately available supply. This is consistent with the standard product lifecycle for a new Jetson module: initial production runs are limited, and allocation priority is managed by NVIDIA.

What this means for ramp timelines

For companies like Figure AI, 1X Technologies, and Apptronik — all of which depend on Jetson Thor for their onboard compute — the robot hardware ramp is partly gated by NVIDIA’s production allocation decisions. A company can design the best humanoid robot frame, train the best foundation model, and sign the best customer contracts, but if Thor modules are on a 6–12 month lead time, the physical production rate is constrained by silicon, not engineering.

Tesla and Waymo are insulated from this constraint. Tesla uses HW4, its own chip. Waymo uses its own custom ASIC. Neither depends on NVIDIA for in-vehicle inference. The constraint falls exclusively on the humanoid startups that chose the fast path of using commercially available NVIDIA hardware rather than investing in custom silicon — a trade-off that made sense at early stage but becomes a bottleneck at production scale.

The longer-term resolution: Humanoid robots reaching meaningful production volumes will face a make-or-buy decision on silicon. Companies that reach Series C and beyond will have the capital to explore custom ASIC development — a 3–5 year program — or to negotiate preferred allocation agreements with NVIDIA. Neither is a short-term solution. For the 2026–2028 period, the NVIDIA Thor allocation ceiling is a real constraint on how fast the humanoid robot industry can scale.


Benchmark context: this is the fifth article in the physical AI series

This tracker is the fifth in a series covering physical AI from multiple angles:

  1. Operational ramp metrics — production counts, deployment scale, miles driven
  2. Humanoid robot technology — hardware generations, dexterity benchmarks, foundation model capabilities
  3. AV safety and regulation — California DMV data, NHTSA crash reporting, state permit maps
  4. Investment and valuation — capital flows, funding rounds, implied valuations
  5. Compute and silicon — this article

The compute layer sits beneath all four prior topics. The operational ramp (article 1) is partly a function of how many inference chips are available. The humanoid technology benchmarks (article 2) depend on what models can run onboard in real time. The investment picture (article 4) will ultimately be shaped by which companies control their own silicon stack and which remain dependent on third-party allocations. Silicon is not the most visible layer of physical AI — but it is the most foundational one.


Sources

Tags

Tip