2026-06-18 — views
AV Simulation — How Waymo and Tesla Train on Billions of Virtual Miles
Waymo runs 20B simulated miles per year; Tesla trains on video from 6M vehicles via Dojo — simulation is the multiplier that separates AV leaders.
Article 74 in the Physical AI Benchmark Series — AV Simulation and Synthetic Data
You cannot teach an autonomous vehicle to handle a pedestrian running a red light, a tire blowing out at highway speed, or a child darting into traffic by waiting for those events to happen on real roads. Simulation is the training multiplier that lets AV companies encounter rare and dangerous scenarios billions of times in software before a real vehicle ever faces them.
Waymo runs an estimated 20 billion simulated miles per year (est.). Tesla uses its Dojo supercomputer to train on video from 6 million real-world vehicles. The gap between AV leaders and followers is not just miles driven — it is the capacity to simulate, to generate synthetic training data at scale, and to close the loop between real-world edge cases and simulated training environments.
This article maps the simulation architectures, the synthetic data pipelines, and what simulation capability means for the Physical AI ramp benchmark.
Section 1 — Why Simulation Is Non-Negotiable
The fundamental problem of AV training is the long tail: the distribution of real-world driving scenarios is extremely wide, but the scenarios that matter most for safety — rare, dangerous, near-miss events — appear extremely infrequently in organic real-world data. Waiting for those events to occur on real roads is not a viable training strategy.
| Training challenge | Real-world approach | Simulation approach |
|---|---|---|
| Rare but critical events | Wait for a pedestrian to run a red light — may happen once per million miles | Generate millions of synthetic red-light-running scenarios with randomized timing, speed, and vehicle positions |
| Fatal scenarios | Cannot intentionally crash a real vehicle into a cyclist | Simulate collisions in full physical fidelity; train avoidance policies without cost |
| Edge case coverage | Real fleet accumulates data organically — slow, geographically biased | Simulation can generate data for any geography, weather, time of day, traffic density |
| Policy iteration speed | Deploy new software → gather real miles → evaluate: weeks per cycle | Test new policy in simulation → evaluate in hours; iterate 100x faster |
| Corner cases (long tail) | The long tail of rare scenarios is impossibly long in real-world data | Simulation can generate targeted long-tail scenarios on demand |
| Safety | Training on truly dangerous scenarios is impossible on real roads | Simulation is safe by definition; no risk to humans or hardware |
The core simulation principle: every 1 mile of real-world driving can be multiplied into thousands of simulated variations — different weather, different road users, different initial conditions. The company that can simulate most effectively can improve its policy network faster than any company relying solely on real-world miles.
Section 2 — Waymo’s Simulation Platform: Carcraft
Waymo operates Carcraft, an internal simulation platform built over more than a decade alongside its real-world AV program. Carcraft is not a supplementary tool — it is Waymo’s primary training environment.
| Attribute | Details |
|---|---|
| Name | Carcraft (Waymo’s internal simulation platform) |
| Scale | Waymo has stated it runs approximately 20 billion simulated miles per year (est.) |
| Architecture | High-fidelity physics simulation; realistic sensor modeling (LIDAR point clouds, camera renders, radar returns); agent behavior models for other vehicles, pedestrians, cyclists |
| Sensor simulation | Waymo simulates the full sensor suite — a simulated LIDAR point cloud must be physically accurate enough for the real perception stack to process it without modification |
| Scenario generation | Real-world driving data fed back into simulation to create systematic variation of edge cases encountered on real roads |
| Agent behavior | Other vehicles and pedestrians in Waymo simulations are modeled with calibrated behavior distributions derived from real-world observations |
| Infrastructure | Runs on Google Cloud TPUs (Google ownership of Waymo enables access to massive compute); one of the largest dedicated simulation compute clusters in any industry (est.) |
| Real-to-sim loop | When a real Waymo vehicle encounters something unexpected, that scenario is automatically ingested into simulation for training and regression testing |
The real-to-sim loop is Waymo’s structural advantage: every real-world edge case becomes simulation training data within hours. A vehicle that encounters an unusual pedestrian behavior in San Francisco can trigger the generation of thousands of synthetic variations of that scenario — different speeds, different lighting, different weather — before the next software update ships.
The scale of Carcraft also enables regression testing at a level impossible for real-world programs. When Waymo ships a software update, it must pass simulation regression against tens of thousands of previously recorded scenarios before a real vehicle runs updated code. This is the simulation safety net.
Section 3 — Tesla’s Approach: Real Video Plus Dojo
Tesla’s training philosophy is fundamentally different from Waymo’s. Where Waymo builds synthetic worlds, Tesla harvests the real one.
| Attribute | Tesla | Waymo |
|---|---|---|
| Primary training data | Real video from 6M or more fleet vehicles (petabytes of real-world camera footage) | Simulation plus real miles from approximately 1,500 AV vehicles |
| Simulation role | Secondary — Tesla uses simulation for specific scenarios but real video is primary | Primary — tens of billions of simulated miles per year (est.) |
| Dojo | Custom supercomputer built specifically for video training at scale; custom D1 chip optimized for bandwidth between tiles | Uses Google Cloud TPUs (Google parent relationship) |
| Dojo D1 chip | Custom 7nm chip; 362 TFLOPS FP16; 900 GB/s interconnect between chips — designed for distributed video processing | Not applicable |
| Training objective | Train a neural network that maps 8 camera feeds directly to driving decisions (end-to-end or imitation learning at scale) | Train specific perception, prediction, and planning modules separately; simulation covers each |
| Advantage | Real-world data distribution — model sees actual edge cases as they happen in reality | Can generate unlimited synthetic data for any scenario; not limited by fleet size |
| Disadvantage | Cannot train on truly rare or dangerous scenarios without waiting for them to happen | Simulation fidelity gap — simulated sensor data is not perfectly identical to real sensor data |
Dojo is Tesla’s answer to Waymo’s TPU cluster access. The D1 chip is purpose-built for the specific computational bottleneck Tesla faces: processing petabytes of continuous video from millions of vehicles and training large neural networks on that data in real time. Where traditional GPU clusters struggle with the memory bandwidth demands of distributed video training, the D1’s 900 GB/s inter-chip interconnect was designed to address this specific bottleneck.
Tesla’s real-data flywheel creates a different kind of compounding advantage. Every FSD mile driven by a Tesla owner generates training data. As the fleet grows, the training data grows proportionally — and crucially, it grows with the exact distribution of real-world scenarios the model will be deployed in. Waymo must engineer its simulation to match that distribution; Tesla is simply collecting it.
Section 4 — The Sim-to-Real Gap: The Unsolved Problem
The fundamental limitation of simulation-heavy approaches is the sim-to-real gap: a model trained on simulated data may not perform as well when deployed on real-world sensor inputs that differ in subtle ways from the simulation.
| Challenge | Description | Current state |
|---|---|---|
| Sensor fidelity | A simulated LIDAR point cloud must match a real LIDAR point cloud closely enough for the model to generalize from sim to real | Waymo has invested heavily in high-fidelity sensor simulation; still not perfect — models trained only on sim data underperform on real data |
| Behavior realism | Simulated pedestrians and drivers must behave like real ones | Calibrated behavior models from real data help; extreme rare behaviors are still hard to model |
| Domain randomization | Strategy: randomize sim parameters widely so model learns to be robust to any sim variation, which transfers better to the real world | Works for some scenarios; insufficient for others |
| NeRF and Gaussian splatting | New approach: reconstruct real scenes from camera video into 3D neural representations; re-render from new viewpoints to generate training data | Waymo, Nvidia, and others are using neural scene reconstruction to reduce the sim-to-real gap; promising but compute-intensive |
| UniSim and GAIA | Waymo (UniSim) and others are building neural simulators that generate photorealistic sensor data from real-world inputs | Active research area; reduces reliance on hand-engineered physics simulation |
Domain randomization — deliberately introducing variation in simulated parameters — was the first systematic strategy for sim-to-real transfer. By training on simulations with randomized lighting, texture, weather, and sensor noise, models become more robust to the specific imperfections of a particular simulator. But domain randomization alone has not closed the gap to the level required for production AV deployment in all conditions.
The NeRF and Gaussian splatting approach represents a fundamentally different strategy: rather than building a synthetic world from first principles, capture the real world in a 3D neural representation and re-render it from any viewpoint or under any conditions. A scene recorded by a Waymo vehicle in San Francisco can be re-rendered in rain, at night, with occluded pedestrians added — without requiring the physics simulation pipeline to model those conditions from scratch.
Section 5 — Simulation as Competitive Moat
Simulation capability has become a primary competitive dimension in the AV race. The company that can simulate faster, more accurately, and at greater scale can iterate policy faster than any competitor relying more heavily on real-world testing.
| Dimension | Leader | Why it matters |
|---|---|---|
| Simulated miles per year | Waymo (approximately 20B miles est.) | More simulated miles equals more edge case coverage equals safer real-world performance |
| Simulation compute | Waymo (Google TPU access) vs Tesla (Dojo) | Scale of compute determines how fast policy can be iterated |
| Real-to-sim pipeline | Waymo (Carcraft real-to-sim loop) | Faster ingestion of real-world edge cases into simulation means faster improvement |
| Neural simulation | Active race (Waymo UniSim, Nvidia COSMOS, others) | Next frontier: photorealistic neural simulator eliminates sim-to-real gap |
| Scenario library | Waymo (largest library built over 10 or more years) | A deep scenario library is hard to replicate — years of engineering to build |
| Data flywheel integration | Tesla (real fleet → real video → training → better model → larger fleet) | Tesla’s advantage: real data at scale; simulation is supplementary |
Nvidia COSMOS (2025): Nvidia launched COSMOS, a world foundation model for physical AI simulation, in early 2025. COSMOS generates photorealistic synthetic video for training robotics and AV systems. It represents the first general-purpose neural world simulator available as a product — potentially democratizing high-fidelity simulation for companies without Waymo’s or Tesla’s in-house simulation infrastructure. For smaller AV programs, COSMOS reduces the barrier to entry for high-quality synthetic data generation from years of engineering investment to a compute budget.
The scenario library advantage is particularly durable. Waymo has spent more than a decade building a library of edge cases, rare events, and corner scenarios — each tagged, categorized, and continuously added to as the real-world fleet encounters new situations. A competitor entering the simulation race today would need to engineer from scratch all the edge cases that Waymo has already catalogued, in addition to building the physics simulation infrastructure. This creates a compounding moat that grows larger with each year of operation.
Section 6 — What Simulation Capability Means for the Ramp Benchmark
In the Physical AI ramp benchmark, simulation capability is a leading indicator rather than a lagging one. The ability to simulate at scale predicts future safety performance improvement rates — a company with strong simulation infrastructure today will be able to iterate policy faster and cover more edge cases in the next 12 to 24 months than a company with weaker simulation.
The benchmark implication: when evaluating AV programs, simulation metrics — simulated miles per year, real-to-sim loop speed, scenario library depth — are as important as current real-world safety statistics. A program with excellent current safety statistics but weak simulation infrastructure faces a ceiling on how fast it can continue improving. A program with strong simulation infrastructure that is still ramping real-world miles may have better long-term improvement velocity.
The competitive picture as of mid-2026 (est.):
- Waymo holds the simulation depth advantage — decade-plus Carcraft investment, 20B simulated miles per year, and an active research program (UniSim) to close the sim-to-real gap via neural simulation
- Tesla holds the real-data scale advantage — 6M vehicle fleet generating continuous real-world video, Dojo purpose-built for processing it
- Nvidia is democratizing access via COSMOS — a neural world simulator that could allow smaller programs to generate photorealistic synthetic data without Waymo-scale infrastructure investment
The race to eliminate the sim-to-real gap via neural world models — Waymo’s UniSim, Nvidia’s COSMOS, and competing efforts at Wayve, Motional, and others — is the frontier that will determine whether the simulation advantage remains concentrated in a small number of incumbents or becomes a commodity that any AV program can access.
Section 7 — About This Series
This is article 74 in the Physical AI Benchmark Series. Previous articles have covered the ramp index, the humanoid race, unit economics, global competition, HD mapping, software and OTA, consumer demand, competitive moats, Cybercab versus Model Y, safety data, Waymo Gen 6, Optimus manufacturing, scorecard snapshots, 2030 forecast scenarios, the investor framework, city expansion pipelines, Tesla FSD state approval maps, AV weather and climate constraints, the talent war, regulatory calendars, robotaxi fare pricing, humanoid deployment trackers, supply chain analysis, consumer adoption demand index, valuation and IPO analysis, the Physical AI 2026 mid-year roundup, AV unit economics cost-per-mile breakdown, the AV data flywheel comparison, AV cybersecurity attack surfaces, the Physical AI supply chain, AV fleet operations, AV insurance and liability evolution, the full lifecycle environmental cost of Physical AI, the accessibility layer for elderly and disabled users, the mapping architecture comparison, and the China AV race.
This article adds the simulation dimension: the synthetic training infrastructure that allows AV leaders to accumulate training experience faster than any real-world fleet can generate it — and the frontier of neural world models that may reshape who holds the simulation advantage over the next five years.
Note: Simulated mile estimates, fleet sizes, chip specifications, and competitive assessments are labeled “(est.)” and reflect publicly available information, company disclosures, and industry analysis where available. This article does not constitute investment advice.
Sources
- Waymo simulation and Carcraft — Waymo technology blog ↗
- Tesla Dojo supercomputer — Tesla AI ↗
- Nvidia COSMOS world foundation model — Nvidia ↗
- Waymo UniSim neural closed-loop sensor simulator — Waymo Research ↗
- Simulation for autonomous driving — Stanford HAI ↗