Skip to content
AI-Daily-Builder

2026-06-18 views

AV Simulation — How Waymo and Tesla Train on Billions of Virtual Miles

Waymo runs 20B simulated miles per year; Tesla trains on video from 6M vehicles via Dojo — simulation is the multiplier that separates AV leaders.

Article 74 in the Physical AI Benchmark Series — AV Simulation and Synthetic Data

You cannot teach an autonomous vehicle to handle a pedestrian running a red light, a tire blowing out at highway speed, or a child darting into traffic by waiting for those events to happen on real roads. Simulation is the training multiplier that lets AV companies encounter rare and dangerous scenarios billions of times in software before a real vehicle ever faces them.

Waymo runs an estimated 20 billion simulated miles per year (est.). Tesla uses its Dojo supercomputer to train on video from 6 million real-world vehicles. The gap between AV leaders and followers is not just miles driven — it is the capacity to simulate, to generate synthetic training data at scale, and to close the loop between real-world edge cases and simulated training environments.

This article maps the simulation architectures, the synthetic data pipelines, and what simulation capability means for the Physical AI ramp benchmark.


Section 1 — Why Simulation Is Non-Negotiable

The fundamental problem of AV training is the long tail: the distribution of real-world driving scenarios is extremely wide, but the scenarios that matter most for safety — rare, dangerous, near-miss events — appear extremely infrequently in organic real-world data. Waiting for those events to occur on real roads is not a viable training strategy.

Training challengeReal-world approachSimulation approach
Rare but critical eventsWait for a pedestrian to run a red light — may happen once per million milesGenerate millions of synthetic red-light-running scenarios with randomized timing, speed, and vehicle positions
Fatal scenariosCannot intentionally crash a real vehicle into a cyclistSimulate collisions in full physical fidelity; train avoidance policies without cost
Edge case coverageReal fleet accumulates data organically — slow, geographically biasedSimulation can generate data for any geography, weather, time of day, traffic density
Policy iteration speedDeploy new software → gather real miles → evaluate: weeks per cycleTest new policy in simulation → evaluate in hours; iterate 100x faster
Corner cases (long tail)The long tail of rare scenarios is impossibly long in real-world dataSimulation can generate targeted long-tail scenarios on demand
SafetyTraining on truly dangerous scenarios is impossible on real roadsSimulation is safe by definition; no risk to humans or hardware

The core simulation principle: every 1 mile of real-world driving can be multiplied into thousands of simulated variations — different weather, different road users, different initial conditions. The company that can simulate most effectively can improve its policy network faster than any company relying solely on real-world miles.


Section 2 — Waymo’s Simulation Platform: Carcraft

Waymo operates Carcraft, an internal simulation platform built over more than a decade alongside its real-world AV program. Carcraft is not a supplementary tool — it is Waymo’s primary training environment.

AttributeDetails
NameCarcraft (Waymo’s internal simulation platform)
ScaleWaymo has stated it runs approximately 20 billion simulated miles per year (est.)
ArchitectureHigh-fidelity physics simulation; realistic sensor modeling (LIDAR point clouds, camera renders, radar returns); agent behavior models for other vehicles, pedestrians, cyclists
Sensor simulationWaymo simulates the full sensor suite — a simulated LIDAR point cloud must be physically accurate enough for the real perception stack to process it without modification
Scenario generationReal-world driving data fed back into simulation to create systematic variation of edge cases encountered on real roads
Agent behaviorOther vehicles and pedestrians in Waymo simulations are modeled with calibrated behavior distributions derived from real-world observations
InfrastructureRuns on Google Cloud TPUs (Google ownership of Waymo enables access to massive compute); one of the largest dedicated simulation compute clusters in any industry (est.)
Real-to-sim loopWhen a real Waymo vehicle encounters something unexpected, that scenario is automatically ingested into simulation for training and regression testing

The real-to-sim loop is Waymo’s structural advantage: every real-world edge case becomes simulation training data within hours. A vehicle that encounters an unusual pedestrian behavior in San Francisco can trigger the generation of thousands of synthetic variations of that scenario — different speeds, different lighting, different weather — before the next software update ships.

The scale of Carcraft also enables regression testing at a level impossible for real-world programs. When Waymo ships a software update, it must pass simulation regression against tens of thousands of previously recorded scenarios before a real vehicle runs updated code. This is the simulation safety net.


Section 3 — Tesla’s Approach: Real Video Plus Dojo

Tesla’s training philosophy is fundamentally different from Waymo’s. Where Waymo builds synthetic worlds, Tesla harvests the real one.

AttributeTeslaWaymo
Primary training dataReal video from 6M or more fleet vehicles (petabytes of real-world camera footage)Simulation plus real miles from approximately 1,500 AV vehicles
Simulation roleSecondary — Tesla uses simulation for specific scenarios but real video is primaryPrimary — tens of billions of simulated miles per year (est.)
DojoCustom supercomputer built specifically for video training at scale; custom D1 chip optimized for bandwidth between tilesUses Google Cloud TPUs (Google parent relationship)
Dojo D1 chipCustom 7nm chip; 362 TFLOPS FP16; 900 GB/s interconnect between chips — designed for distributed video processingNot applicable
Training objectiveTrain a neural network that maps 8 camera feeds directly to driving decisions (end-to-end or imitation learning at scale)Train specific perception, prediction, and planning modules separately; simulation covers each
AdvantageReal-world data distribution — model sees actual edge cases as they happen in realityCan generate unlimited synthetic data for any scenario; not limited by fleet size
DisadvantageCannot train on truly rare or dangerous scenarios without waiting for them to happenSimulation fidelity gap — simulated sensor data is not perfectly identical to real sensor data

Dojo is Tesla’s answer to Waymo’s TPU cluster access. The D1 chip is purpose-built for the specific computational bottleneck Tesla faces: processing petabytes of continuous video from millions of vehicles and training large neural networks on that data in real time. Where traditional GPU clusters struggle with the memory bandwidth demands of distributed video training, the D1’s 900 GB/s inter-chip interconnect was designed to address this specific bottleneck.

Tesla’s real-data flywheel creates a different kind of compounding advantage. Every FSD mile driven by a Tesla owner generates training data. As the fleet grows, the training data grows proportionally — and crucially, it grows with the exact distribution of real-world scenarios the model will be deployed in. Waymo must engineer its simulation to match that distribution; Tesla is simply collecting it.


Section 4 — The Sim-to-Real Gap: The Unsolved Problem

The fundamental limitation of simulation-heavy approaches is the sim-to-real gap: a model trained on simulated data may not perform as well when deployed on real-world sensor inputs that differ in subtle ways from the simulation.

ChallengeDescriptionCurrent state
Sensor fidelityA simulated LIDAR point cloud must match a real LIDAR point cloud closely enough for the model to generalize from sim to realWaymo has invested heavily in high-fidelity sensor simulation; still not perfect — models trained only on sim data underperform on real data
Behavior realismSimulated pedestrians and drivers must behave like real onesCalibrated behavior models from real data help; extreme rare behaviors are still hard to model
Domain randomizationStrategy: randomize sim parameters widely so model learns to be robust to any sim variation, which transfers better to the real worldWorks for some scenarios; insufficient for others
NeRF and Gaussian splattingNew approach: reconstruct real scenes from camera video into 3D neural representations; re-render from new viewpoints to generate training dataWaymo, Nvidia, and others are using neural scene reconstruction to reduce the sim-to-real gap; promising but compute-intensive
UniSim and GAIAWaymo (UniSim) and others are building neural simulators that generate photorealistic sensor data from real-world inputsActive research area; reduces reliance on hand-engineered physics simulation

Domain randomization — deliberately introducing variation in simulated parameters — was the first systematic strategy for sim-to-real transfer. By training on simulations with randomized lighting, texture, weather, and sensor noise, models become more robust to the specific imperfections of a particular simulator. But domain randomization alone has not closed the gap to the level required for production AV deployment in all conditions.

The NeRF and Gaussian splatting approach represents a fundamentally different strategy: rather than building a synthetic world from first principles, capture the real world in a 3D neural representation and re-render it from any viewpoint or under any conditions. A scene recorded by a Waymo vehicle in San Francisco can be re-rendered in rain, at night, with occluded pedestrians added — without requiring the physics simulation pipeline to model those conditions from scratch.


Section 5 — Simulation as Competitive Moat

Simulation capability has become a primary competitive dimension in the AV race. The company that can simulate faster, more accurately, and at greater scale can iterate policy faster than any competitor relying more heavily on real-world testing.

DimensionLeaderWhy it matters
Simulated miles per yearWaymo (approximately 20B miles est.)More simulated miles equals more edge case coverage equals safer real-world performance
Simulation computeWaymo (Google TPU access) vs Tesla (Dojo)Scale of compute determines how fast policy can be iterated
Real-to-sim pipelineWaymo (Carcraft real-to-sim loop)Faster ingestion of real-world edge cases into simulation means faster improvement
Neural simulationActive race (Waymo UniSim, Nvidia COSMOS, others)Next frontier: photorealistic neural simulator eliminates sim-to-real gap
Scenario libraryWaymo (largest library built over 10 or more years)A deep scenario library is hard to replicate — years of engineering to build
Data flywheel integrationTesla (real fleet → real video → training → better model → larger fleet)Tesla’s advantage: real data at scale; simulation is supplementary

Nvidia COSMOS (2025): Nvidia launched COSMOS, a world foundation model for physical AI simulation, in early 2025. COSMOS generates photorealistic synthetic video for training robotics and AV systems. It represents the first general-purpose neural world simulator available as a product — potentially democratizing high-fidelity simulation for companies without Waymo’s or Tesla’s in-house simulation infrastructure. For smaller AV programs, COSMOS reduces the barrier to entry for high-quality synthetic data generation from years of engineering investment to a compute budget.

The scenario library advantage is particularly durable. Waymo has spent more than a decade building a library of edge cases, rare events, and corner scenarios — each tagged, categorized, and continuously added to as the real-world fleet encounters new situations. A competitor entering the simulation race today would need to engineer from scratch all the edge cases that Waymo has already catalogued, in addition to building the physics simulation infrastructure. This creates a compounding moat that grows larger with each year of operation.


Section 6 — What Simulation Capability Means for the Ramp Benchmark

In the Physical AI ramp benchmark, simulation capability is a leading indicator rather than a lagging one. The ability to simulate at scale predicts future safety performance improvement rates — a company with strong simulation infrastructure today will be able to iterate policy faster and cover more edge cases in the next 12 to 24 months than a company with weaker simulation.

The benchmark implication: when evaluating AV programs, simulation metrics — simulated miles per year, real-to-sim loop speed, scenario library depth — are as important as current real-world safety statistics. A program with excellent current safety statistics but weak simulation infrastructure faces a ceiling on how fast it can continue improving. A program with strong simulation infrastructure that is still ramping real-world miles may have better long-term improvement velocity.

The competitive picture as of mid-2026 (est.):

The race to eliminate the sim-to-real gap via neural world models — Waymo’s UniSim, Nvidia’s COSMOS, and competing efforts at Wayve, Motional, and others — is the frontier that will determine whether the simulation advantage remains concentrated in a small number of incumbents or becomes a commodity that any AV program can access.


Section 7 — About This Series

This is article 74 in the Physical AI Benchmark Series. Previous articles have covered the ramp index, the humanoid race, unit economics, global competition, HD mapping, software and OTA, consumer demand, competitive moats, Cybercab versus Model Y, safety data, Waymo Gen 6, Optimus manufacturing, scorecard snapshots, 2030 forecast scenarios, the investor framework, city expansion pipelines, Tesla FSD state approval maps, AV weather and climate constraints, the talent war, regulatory calendars, robotaxi fare pricing, humanoid deployment trackers, supply chain analysis, consumer adoption demand index, valuation and IPO analysis, the Physical AI 2026 mid-year roundup, AV unit economics cost-per-mile breakdown, the AV data flywheel comparison, AV cybersecurity attack surfaces, the Physical AI supply chain, AV fleet operations, AV insurance and liability evolution, the full lifecycle environmental cost of Physical AI, the accessibility layer for elderly and disabled users, the mapping architecture comparison, and the China AV race.

This article adds the simulation dimension: the synthetic training infrastructure that allows AV leaders to accumulate training experience faster than any real-world fleet can generate it — and the frontier of neural world models that may reshape who holds the simulation advantage over the next five years.

Note: Simulated mile estimates, fleet sizes, chip specifications, and competitive assessments are labeled “(est.)” and reflect publicly available information, company disclosures, and industry analysis where available. This article does not constitute investment advice.


Sources

Tags

Tip