Skip to content
AI-Daily-Builder

2026-06-18 views

Physical AI Simulation — Tesla Rerender vs Waymo CarCraft and the Synthetic Data Race

Tesla's neural rerender engine and Waymo's CarCraft platform represent two fundamentally different bets on how to generate synthetic training data at scale.

Article 109 in the Physical AI Benchmark Series — Physical AI Simulation Infrastructure: How Tesla’s Rerender Engine and Waymo’s Sim Platform Generate Synthetic Training Data, Test Rare Edge Cases, and Multiply the Value of Real-World Miles

Real-world driving data is expensive to collect, difficult to label, and impossible to fully control. You cannot make it rain on demand to test your model’s behavior in wet conditions. You cannot force a near-miss scenario to test emergency response. You cannot run the same intersection encounter 10,000 times to measure variance. Simulation solves this by generating synthetic driving scenarios at scale — and the quality of a company’s simulation infrastructure is a direct multiplier on training data quality and therefore on model quality.

Tesla and Waymo have built fundamentally different simulation approaches that reflect their broader architectural philosophies. Tesla’s rerender engine starts from real camera footage and reconstructs it into photorealistic synthetic variants — eliminating the gap between simulation and reality for a vision-only system. Waymo’s CarCraft platform builds 3D world models that must simulate lidar, camera, and radar simultaneously — a harder physics problem, but one that produces a richer multi-sensor synthetic dataset. Understanding these approaches is necessary context for any serious analysis of where the Physical AI race stands in 2026.


Section 1 — What Simulation Does for AV Training

Simulation functionWhy it mattersWithout simulation
Rare edge case generationReal-world data is heavily weighted toward normal driving; near-miss scenarios, unusual pedestrian behavior, sensor failures are rare — simulation generates them on demandModel never sees edge cases until they occur in production; catastrophic failure risk
Counterfactual testingAsk “what would have happened if the vehicle had turned left instead of right?” — only answerable in simulationCannot safely test alternative decisions with real vehicles
Scale beyond real-world dataGenerate millions of training scenarios per day in simulation; real-world fleet generates thousandsData-hungry models bottlenecked by real-world collection rate
Sensor model fidelitySimulate exactly what lidar/camera/radar sees under different weather, lighting, sensor-degradation conditionsCannot train for sensor degradation without actually degrading sensors
Regression testingRun every software release against thousands of simulated scenarios before deployment; catch regressions before they affect real vehiclesEvery software update is a live experiment; higher risk
Safety critical system validationRegulatory bodies increasingly accept simulation as part of functional safety validation (ISO 26262, SOTIF)Must conduct all safety validation on real roads — impractically slow

The key insight is that simulation does not replace real-world data — it multiplies it. A single real-world near-miss event, properly captured and fed into a rerender or 3D reconstruction pipeline, can generate thousands of training variants: different lighting, different vehicle speeds, different pedestrian trajectories. The fleet is the seed; simulation is the multiplier.


Section 2 — Tesla’s Simulation Approach: Neural Rendering and Rerender

Tesla’s simulation strategy is architecturally tied to its vision-only, no-lidar approach. Because FSD sees only through cameras, the most realistic synthetic training data is photorealistic re-rendered camera images — not abstract 3D point clouds or physics engine outputs.

ComponentWhat it doesWhy it matters
Rerender engineTakes real-world video clips from the fleet and reconstructs the scene in a physics-accurate 3D representation; then re-renders the scene from different viewpoints, lighting conditions, weather, or with inserted synthetic objects (vehicles, pedestrians)A near-miss that happened once in Phoenix can be re-rendered 10,000 times with variations — effectively multiplying one real event into thousands of training examples
Neural Radiance Fields (NeRF) / 3D Gaussian SplattingNeural scene reconstruction methods that build photo-realistic 3D representations from multiple camera angles; Tesla uses proprietary variants of these techniques (est.)Allows photorealistic re-rendering at camera resolution matching exactly what FSD sees; no “sim-to-real gap” problem because the base scene is real
Auto-labeling pipelineFSD itself labels the reconstructed scenes — if the model identifies a pedestrian in the original clip, that label propagates to all re-rendered variantsReduces human labeling cost; scales label generation with model capability
Dojo integrationSimulated scenarios feed directly into Dojo training runs; compute and simulation are co-designedTight integration means faster experiment-to-model iteration cycles
Key advantageNo sim-to-real gap — the rendered scene is photorealistic because it starts from real sensor data; what the model trains on matches exactly what it sees in productionTraditional synthetic simulation has “sim-to-real gap”: model trained on synthetic visuals may behave differently on real camera images
ScaleTesla claims the ability to generate billions of simulated training miles (est.); exact figures not disclosedOrders of magnitude more data than real-world collection alone

The architecture decision to build a rerender engine is a direct consequence of Tesla’s vision-only bet. If you use lidar, you need a lidar simulator. If you use only cameras, you can build a neural rerender engine that produces images indistinguishable from the real world — and those images are, in a meaningful sense, real, because they start from real footage. The sim-to-real gap shrinks to near zero.

The constraint is that this approach is tightly coupled to the camera-only stack. Tesla cannot use rerender to train a lidar model — there is no lidar data to rerender. This is a coherent architectural choice, but it means the entire simulation infrastructure is a bet on vision-only being sufficient for full autonomy.


Section 3 — Waymo’s Simulation Approach: CarCraft and Closed-Loop Testing

Waymo’s simulation strategy reflects its multi-sensor, HD-map-dependent architecture. CarCraft must simultaneously simulate lidar point clouds, camera images, and radar returns — a substantially harder physics problem than camera-only rerender.

ComponentWhat it does
CarCraftWaymo’s internal simulation platform (publicly disclosed); runs millions of miles of simulated driving per day (est.); models vehicles, pedestrians, cyclists, road geometry from HD maps
Scenario extraction from real drivesReal-world incidents and near-misses are extracted, anonymized, and seeded into simulation to generate variations; similar to Tesla’s rerender concept but applied to a 3D world model rather than camera images
Multi-sensor simulationMust simulate lidar (3D point cloud), camera (2D image), and radar (range plus velocity) simultaneously; more complex than camera-only simulation
Behavior modelingSimulates realistic behavior of other road users (drivers who cut lanes, pedestrians who jaywalk, cyclists who wobble); key differentiator vs. naive simulation
Closed-loop testingThe simulated vehicle’s decisions affect the simulation world; other simulated agents react to the AV’s choices
ScaleWaymo has disclosed running tens of millions of simulation miles per day (est.)
Sim-to-real challengeWaymo’s lidar simulation must accurately model how laser pulses interact with surfaces, retroreflective materials, glass — a harder physics problem than camera image synthesis (est.)

The closed-loop capability is Waymo’s most important simulation advantage. In open-loop testing, the AV’s decisions do not affect what happens next in the simulation — the scenario plays out the same way regardless of what the AV does. In closed-loop testing, the simulated world responds to the AV’s actions: if the AV brakes, the simulated car behind it must respond; if the AV changes lanes, the simulated cyclist must react. This catches an entire class of failure modes — scenarios where the AV’s own behavior creates the dangerous situation — that open-loop testing cannot detect.

CarCraft’s scale (tens of millions of simulated miles per day, est.) is lower than Tesla’s stated billions, but the comparison is not apples-to-apples: Waymo’s simulated miles include full multi-sensor data generation at every step, which is computationally more expensive per mile than camera-only rerender.


Section 4 — NVIDIA Omniverse as Industry Infrastructure

Not every AV company can afford to build a proprietary simulation platform at the scale of CarCraft or Tesla’s rerender engine. NVIDIA’s Omniverse platform provides industry-standard simulation infrastructure that smaller AV programs and robotics startups use as their primary simulation environment.

DimensionWhat NVIDIA providesWho uses it
Omniverse platformPhysics-accurate simulation environment built on USD (Universal Scene Description); used for robotic simulation, AV testing, and industrial digital twinsBroadly adopted in robotics (Figure AI, Boston Dynamics, et al.); some AV companies use for non-production simulation (est.)
Isaac SimNVIDIA’s robotics simulation platform within Omniverse; physically accurate sensor models; ROS2 compatibleHumanoid robot development; not Tesla’s primary AV simulation (Tesla builds proprietary)
Drive SimNVIDIA’s AV-specific simulation within Omniverse; lidar/camera/radar sensor models; weather simulation; used by several AV companies (Cruise, BYD, others — est.)Waymo uses primarily proprietary CarCraft; some OEMs use NVIDIA Drive Sim
Synthetic data generationNVIDIA’s platforms can generate labeled synthetic training data at scaleSmaller AV programs and robotics startups that cannot build proprietary simulation

NVIDIA’s simulation business is strategically important because it commoditizes the baseline simulation capability that every AV and robotics company needs. Companies that cannot afford to build a CarCraft or a rerender engine can still run large-scale simulation on Drive Sim or Isaac Sim — at a cost that scales with NVIDIA hardware. This creates a dependency that NVIDIA’s hardware roadmap can serve.

The implication for the Physical AI benchmark is that NVIDIA is not just a chip supplier to the AV industry — it is also a simulation infrastructure provider whose platform choices affect how every company in the ecosystem trains its models.


Section 5 — Simulation Benchmark Metrics

MetricWhat it measuresTesla (est.)Waymo (est.)
Simulated miles per dayVolume of synthetic driving experience generatedBillions/day claim (not independently verified)Tens of millions/day (disclosed)
Scenario library sizeNumber of distinct edge-case scenarios available for training/testingNot disclosedNot disclosed
Sim-to-real fidelityHow closely simulation matches real sensor outputVery high (rerender from real data); minimal gapHigh (multi-sensor physics models); some gap for rare surfaces (est.)
Closed-loop capabilitySimulated AV decisions affect simulation worldYes (est.)Yes (CarCraft — disclosed)
Regression test coverageScenarios tested on each software releaseNot disclosedNot disclosed
Key advantageNeural rerender eliminates sim-to-real gap for camera; scales edge cases from real eventsMulti-sensor simulation; robust closed-loop; established platform (est.)

The most important metric is sim-to-real fidelity — how closely the simulation matches what the real vehicle actually experiences. A simulation that looks photo-realistic but does not accurately model lighting, shadows, sensor noise, or surface reflectivity will train models that behave differently in the real world. Tesla’s rerender approach scores extremely high on this dimension for camera data, because the base scene is real. Waymo’s lidar simulation must model the physics of laser pulse propagation — a harder problem, but one where inaccuracy has direct safety consequences.


Section 6 — Strategic Implications for the Physical AI Race

Simulation infrastructure is not a differentiator that maps cleanly to market share — it is a necessary condition for the quality of the AI models that do map to market share. The simulation advantage compounds: better simulation produces better training data, which produces better models, which makes the vehicles safer, which justifies more deployment, which generates more real-world data to seed into simulation.

Strategic dimensionTesla positionWaymo position
Simulation-to-model feedback loopFast — Dojo integration means tight cycle from rerender to training run (est.)Mature — CarCraft has been running at scale for years
Data flywheel6M plus FSD vehicles generating real footage that seeds the rerender engineSmaller fleet but higher-quality sensor data per vehicle
Sim-to-real gapNear zero for camera (rerender from real footage)Low for camera; some residual for lidar rare-surface interactions (est.)
Multi-sensor simulation capabilityNot needed — vision-only; not built (est.)Required by architecture; built into CarCraft
Third-party dependencyNone for primary simulation (proprietary rerender + Dojo)None for primary simulation (CarCraft is proprietary)
Competitive moatRerender engine tied to 6M-vehicle fleet — competitors cannot replicate the dataCarCraft scale and closed-loop maturity built over 10 plus years

The bottom line for the Physical AI benchmark: simulation quality is a hidden multiplier on every other metric. A company with a 100,000-vehicle fleet and a high-fidelity simulation engine that generates 1,000 variants per real-world event is effectively collecting 100 million vehicles’ worth of training data. The company that wins the simulation race wins the training data race — and the training data race is the AI model race.

Tesla’s rerender approach is architecturally elegant and tightly coupled to the vision-only bet. If vision-only is sufficient for full autonomy, Tesla’s simulation infrastructure is probably the highest-fidelity synthetic data generation system in the industry. Waymo’s CarCraft is a more complex, more expensive, multi-sensor simulation that is necessary for its sensor-redundant architecture — and its closed-loop capability is a genuine advantage for catching AV-induced failure modes.

Neither approach is definitively superior — they are coherent implementations of two different bets on what the correct architecture for autonomous driving looks like.


Section 7 — What to Watch in 2026

The simulation infrastructure race will be decided by a small number of observable signals in the second half of 2026.

SignalWhat it indicatesWhy it matters
Tesla FSD improvement rateIf FSD v13 and beyond continues to improve rapidly, it validates the rerender engine’s effectivenessThe rate of model improvement is the clearest proxy for simulation data quality
Waymo geographic expansion speedNew city launches require simulation of new road geometry and edge cases; speed of expansion tests CarCraft’s generalizationSlow expansion despite high simulation volume would suggest sim-to-real transfer limitations
NVIDIA Drive Sim adoptionIf major OEMs announce Drive Sim deployments, it signals that proprietary simulation is too expensive for non-hyperscalersCommoditization of simulation capability would level the playing field below Tesla and Waymo
Regulatory acceptance of simulation evidenceIf NHTSA or EU regulators formally accept simulation-generated evidence for safety validation, simulation investment becomes directly tied to approval timelinesChanges the ROI calculation for every AV company’s simulation budget
Academic NeRF/3DGS advancesImprovements in neural scene reconstruction techniques directly translate to rerender fidelityOpen-source advances could allow smaller players to close the simulation quality gap with Tesla (est.)

The simulation infrastructure question is ultimately a question about the rate at which the Physical AI industry can accelerate beyond what real-world data alone can support. Both Tesla and Waymo have concluded that real-world miles alone are insufficient — and both have invested billions in simulation infrastructure as a result. The companies that get simulation right will train better models faster, with less risk, at lower marginal cost. Simulation is not a supporting function of the AV development process — it is the primary lever on development velocity.

Note: Figures labeled “(est.)” are directional estimates based on publicly available information as of mid-2026. Exact simulation volumes and internal platform details have not been independently verified. This article does not constitute investment advice.


Sources

Tags

Tip