2026-06-18 — views

Physical AI Simulation — Tesla Rerender vs Waymo CarCraft and the Synthetic Data Race

Tesla's neural rerender engine and Waymo's CarCraft platform represent two fundamentally different bets on how to generate synthetic training data at scale.

Article 109 in the Physical AI Benchmark Series — Physical AI Simulation Infrastructure: How Tesla’s Rerender Engine and Waymo’s Sim Platform Generate Synthetic Training Data, Test Rare Edge Cases, and Multiply the Value of Real-World Miles

Real-world driving data is expensive to collect, difficult to label, and impossible to fully control. You cannot make it rain on demand to test your model’s behavior in wet conditions. You cannot force a near-miss scenario to test emergency response. You cannot run the same intersection encounter 10,000 times to measure variance. Simulation solves this by generating synthetic driving scenarios at scale — and the quality of a company’s simulation infrastructure is a direct multiplier on training data quality and therefore on model quality.

Tesla and Waymo have built fundamentally different simulation approaches that reflect their broader architectural philosophies. Tesla’s rerender engine starts from real camera footage and reconstructs it into photorealistic synthetic variants — eliminating the gap between simulation and reality for a vision-only system. Waymo’s CarCraft platform builds 3D world models that must simulate lidar, camera, and radar simultaneously — a harder physics problem, but one that produces a richer multi-sensor synthetic dataset. Understanding these approaches is necessary context for any serious analysis of where the Physical AI race stands in 2026.

Section 1 — What Simulation Does for AV Training

Simulation function	Why it matters	Without simulation
Rare edge case generation	Real-world data is heavily weighted toward normal driving; near-miss scenarios, unusual pedestrian behavior, sensor failures are rare — simulation generates them on demand	Model never sees edge cases until they occur in production; catastrophic failure risk
Counterfactual testing	Ask “what would have happened if the vehicle had turned left instead of right?” — only answerable in simulation	Cannot safely test alternative decisions with real vehicles
Scale beyond real-world data	Generate millions of training scenarios per day in simulation; real-world fleet generates thousands	Data-hungry models bottlenecked by real-world collection rate
Sensor model fidelity	Simulate exactly what lidar/camera/radar sees under different weather, lighting, sensor-degradation conditions	Cannot train for sensor degradation without actually degrading sensors
Regression testing	Run every software release against thousands of simulated scenarios before deployment; catch regressions before they affect real vehicles	Every software update is a live experiment; higher risk
Safety critical system validation	Regulatory bodies increasingly accept simulation as part of functional safety validation (ISO 26262, SOTIF)	Must conduct all safety validation on real roads — impractically slow

The key insight is that simulation does not replace real-world data — it multiplies it. A single real-world near-miss event, properly captured and fed into a rerender or 3D reconstruction pipeline, can generate thousands of training variants: different lighting, different vehicle speeds, different pedestrian trajectories. The fleet is the seed; simulation is the multiplier.

Section 2 — Tesla’s Simulation Approach: Neural Rendering and Rerender

Tesla’s simulation strategy is architecturally tied to its vision-only, no-lidar approach. Because FSD sees only through cameras, the most realistic synthetic training data is photorealistic re-rendered camera images — not abstract 3D point clouds or physics engine outputs.

Component	What it does	Why it matters
Rerender engine	Takes real-world video clips from the fleet and reconstructs the scene in a physics-accurate 3D representation; then re-renders the scene from different viewpoints, lighting conditions, weather, or with inserted synthetic objects (vehicles, pedestrians)	A near-miss that happened once in Phoenix can be re-rendered 10,000 times with variations — effectively multiplying one real event into thousands of training examples
Neural Radiance Fields (NeRF) / 3D Gaussian Splatting	Neural scene reconstruction methods that build photo-realistic 3D representations from multiple camera angles; Tesla uses proprietary variants of these techniques (est.)	Allows photorealistic re-rendering at camera resolution matching exactly what FSD sees; no “sim-to-real gap” problem because the base scene is real
Auto-labeling pipeline	FSD itself labels the reconstructed scenes — if the model identifies a pedestrian in the original clip, that label propagates to all re-rendered variants	Reduces human labeling cost; scales label generation with model capability
Dojo integration	Simulated scenarios feed directly into Dojo training runs; compute and simulation are co-designed	Tight integration means faster experiment-to-model iteration cycles
Key advantage	No sim-to-real gap — the rendered scene is photorealistic because it starts from real sensor data; what the model trains on matches exactly what it sees in production	Traditional synthetic simulation has “sim-to-real gap”: model trained on synthetic visuals may behave differently on real camera images
Scale	Tesla claims the ability to generate billions of simulated training miles (est.); exact figures not disclosed	Orders of magnitude more data than real-world collection alone

The architecture decision to build a rerender engine is a direct consequence of Tesla’s vision-only bet. If you use lidar, you need a lidar simulator. If you use only cameras, you can build a neural rerender engine that produces images indistinguishable from the real world — and those images are, in a meaningful sense, real, because they start from real footage. The sim-to-real gap shrinks to near zero.

The constraint is that this approach is tightly coupled to the camera-only stack. Tesla cannot use rerender to train a lidar model — there is no lidar data to rerender. This is a coherent architectural choice, but it means the entire simulation infrastructure is a bet on vision-only being sufficient for full autonomy.

Section 3 — Waymo’s Simulation Approach: CarCraft and Closed-Loop Testing

Waymo’s simulation strategy reflects its multi-sensor, HD-map-dependent architecture. CarCraft must simultaneously simulate lidar point clouds, camera images, and radar returns — a substantially harder physics problem than camera-only rerender.

Component	What it does
CarCraft	Waymo’s internal simulation platform (publicly disclosed); runs millions of miles of simulated driving per day (est.); models vehicles, pedestrians, cyclists, road geometry from HD maps
Scenario extraction from real drives	Real-world incidents and near-misses are extracted, anonymized, and seeded into simulation to generate variations; similar to Tesla’s rerender concept but applied to a 3D world model rather than camera images
Multi-sensor simulation	Must simulate lidar (3D point cloud), camera (2D image), and radar (range plus velocity) simultaneously; more complex than camera-only simulation
Behavior modeling	Simulates realistic behavior of other road users (drivers who cut lanes, pedestrians who jaywalk, cyclists who wobble); key differentiator vs. naive simulation
Closed-loop testing	The simulated vehicle’s decisions affect the simulation world; other simulated agents react to the AV’s choices
Scale	Waymo has disclosed running tens of millions of simulation miles per day (est.)
Sim-to-real challenge	Waymo’s lidar simulation must accurately model how laser pulses interact with surfaces, retroreflective materials, glass — a harder physics problem than camera image synthesis (est.)

The closed-loop capability is Waymo’s most important simulation advantage. In open-loop testing, the AV’s decisions do not affect what happens next in the simulation — the scenario plays out the same way regardless of what the AV does. In closed-loop testing, the simulated world responds to the AV’s actions: if the AV brakes, the simulated car behind it must respond; if the AV changes lanes, the simulated cyclist must react. This catches an entire class of failure modes — scenarios where the AV’s own behavior creates the dangerous situation — that open-loop testing cannot detect.

CarCraft’s scale (tens of millions of simulated miles per day, est.) is lower than Tesla’s stated billions, but the comparison is not apples-to-apples: Waymo’s simulated miles include full multi-sensor data generation at every step, which is computationally more expensive per mile than camera-only rerender.

Section 4 — NVIDIA Omniverse as Industry Infrastructure

Not every AV company can afford to build a proprietary simulation platform at the scale of CarCraft or Tesla’s rerender engine. NVIDIA’s Omniverse platform provides industry-standard simulation infrastructure that smaller AV programs and robotics startups use as their primary simulation environment.

Dimension	What NVIDIA provides	Who uses it
Omniverse platform	Physics-accurate simulation environment built on USD (Universal Scene Description); used for robotic simulation, AV testing, and industrial digital twins	Broadly adopted in robotics (Figure AI, Boston Dynamics, et al.); some AV companies use for non-production simulation (est.)
Isaac Sim	NVIDIA’s robotics simulation platform within Omniverse; physically accurate sensor models; ROS2 compatible	Humanoid robot development; not Tesla’s primary AV simulation (Tesla builds proprietary)
Drive Sim	NVIDIA’s AV-specific simulation within Omniverse; lidar/camera/radar sensor models; weather simulation; used by several AV companies (Cruise, BYD, others — est.)	Waymo uses primarily proprietary CarCraft; some OEMs use NVIDIA Drive Sim
Synthetic data generation	NVIDIA’s platforms can generate labeled synthetic training data at scale	Smaller AV programs and robotics startups that cannot build proprietary simulation

NVIDIA’s simulation business is strategically important because it commoditizes the baseline simulation capability that every AV and robotics company needs. Companies that cannot afford to build a CarCraft or a rerender engine can still run large-scale simulation on Drive Sim or Isaac Sim — at a cost that scales with NVIDIA hardware. This creates a dependency that NVIDIA’s hardware roadmap can serve.

The implication for the Physical AI benchmark is that NVIDIA is not just a chip supplier to the AV industry — it is also a simulation infrastructure provider whose platform choices affect how every company in the ecosystem trains its models.

Section 5 — Simulation Benchmark Metrics

Metric	What it measures	Tesla (est.)	Waymo (est.)
Simulated miles per day	Volume of synthetic driving experience generated	Billions/day claim (not independently verified)	Tens of millions/day (disclosed)
Scenario library size	Number of distinct edge-case scenarios available for training/testing	Not disclosed	Not disclosed
Sim-to-real fidelity	How closely simulation matches real sensor output	Very high (rerender from real data); minimal gap	High (multi-sensor physics models); some gap for rare surfaces (est.)
Closed-loop capability	Simulated AV decisions affect simulation world	Yes (est.)	Yes (CarCraft — disclosed)
Regression test coverage	Scenarios tested on each software release	Not disclosed	Not disclosed
Key advantage	Neural rerender eliminates sim-to-real gap for camera; scales edge cases from real events	Multi-sensor simulation; robust closed-loop; established platform (est.)

The most important metric is sim-to-real fidelity — how closely the simulation matches what the real vehicle actually experiences. A simulation that looks photo-realistic but does not accurately model lighting, shadows, sensor noise, or surface reflectivity will train models that behave differently in the real world. Tesla’s rerender approach scores extremely high on this dimension for camera data, because the base scene is real. Waymo’s lidar simulation must model the physics of laser pulse propagation — a harder problem, but one where inaccuracy has direct safety consequences.

Section 6 — Strategic Implications for the Physical AI Race

Simulation infrastructure is not a differentiator that maps cleanly to market share — it is a necessary condition for the quality of the AI models that do map to market share. The simulation advantage compounds: better simulation produces better training data, which produces better models, which makes the vehicles safer, which justifies more deployment, which generates more real-world data to seed into simulation.

Strategic dimension	Tesla position	Waymo position
Simulation-to-model feedback loop	Fast — Dojo integration means tight cycle from rerender to training run (est.)	Mature — CarCraft has been running at scale for years
Data flywheel	6M plus FSD vehicles generating real footage that seeds the rerender engine	Smaller fleet but higher-quality sensor data per vehicle
Sim-to-real gap	Near zero for camera (rerender from real footage)	Low for camera; some residual for lidar rare-surface interactions (est.)
Multi-sensor simulation capability	Not needed — vision-only; not built (est.)	Required by architecture; built into CarCraft
Third-party dependency	None for primary simulation (proprietary rerender + Dojo)	None for primary simulation (CarCraft is proprietary)
Competitive moat	Rerender engine tied to 6M-vehicle fleet — competitors cannot replicate the data	CarCraft scale and closed-loop maturity built over 10 plus years

The bottom line for the Physical AI benchmark: simulation quality is a hidden multiplier on every other metric. A company with a 100,000-vehicle fleet and a high-fidelity simulation engine that generates 1,000 variants per real-world event is effectively collecting 100 million vehicles’ worth of training data. The company that wins the simulation race wins the training data race — and the training data race is the AI model race.

Tesla’s rerender approach is architecturally elegant and tightly coupled to the vision-only bet. If vision-only is sufficient for full autonomy, Tesla’s simulation infrastructure is probably the highest-fidelity synthetic data generation system in the industry. Waymo’s CarCraft is a more complex, more expensive, multi-sensor simulation that is necessary for its sensor-redundant architecture — and its closed-loop capability is a genuine advantage for catching AV-induced failure modes.

Neither approach is definitively superior — they are coherent implementations of two different bets on what the correct architecture for autonomous driving looks like.

Section 7 — What to Watch in 2026

The simulation infrastructure race will be decided by a small number of observable signals in the second half of 2026.

Signal	What it indicates	Why it matters
Tesla FSD improvement rate	If FSD v13 and beyond continues to improve rapidly, it validates the rerender engine’s effectiveness	The rate of model improvement is the clearest proxy for simulation data quality
Waymo geographic expansion speed	New city launches require simulation of new road geometry and edge cases; speed of expansion tests CarCraft’s generalization	Slow expansion despite high simulation volume would suggest sim-to-real transfer limitations
NVIDIA Drive Sim adoption	If major OEMs announce Drive Sim deployments, it signals that proprietary simulation is too expensive for non-hyperscalers	Commoditization of simulation capability would level the playing field below Tesla and Waymo
Regulatory acceptance of simulation evidence	If NHTSA or EU regulators formally accept simulation-generated evidence for safety validation, simulation investment becomes directly tied to approval timelines	Changes the ROI calculation for every AV company’s simulation budget
Academic NeRF/3DGS advances	Improvements in neural scene reconstruction techniques directly translate to rerender fidelity	Open-source advances could allow smaller players to close the simulation quality gap with Tesla (est.)

The simulation infrastructure question is ultimately a question about the rate at which the Physical AI industry can accelerate beyond what real-world data alone can support. Both Tesla and Waymo have concluded that real-world miles alone are insufficient — and both have invested billions in simulation infrastructure as a result. The companies that get simulation right will train better models faster, with less risk, at lower marginal cost. Simulation is not a supporting function of the AV development process — it is the primary lever on development velocity.

Note: Figures labeled “(est.)” are directional estimates based on publicly available information as of mid-2026. Exact simulation volumes and internal platform details have not been independently verified. This article does not constitute investment advice.