2026-06-18 — views

AV Safety Metrics — Why There Is No Standard Way to Compare Tesla and Waymo

AV companies report safety using incompatible metrics. Here is what a real Physical AI Ramp Index should measure, and where the leaders stand.

Article 80 in the Physical AI Benchmark Series — AV Safety Metrics: Why There Is No Standard Way to Compare Tesla and Waymo

One of the fundamental problems with tracking the autonomous vehicle ramp is that Tesla, Waymo, Aurora, and every other AV company reports safety and performance using incompatible metrics. Waymo talks about “miles between serious incidents.” Tesla reports “FSD critical disengagement rate per 1,000 miles.” Aurora cites “miles per intervention.” California DMV publishes disengagement reports that companies have learned to game. There is no standardized Physical AI Ramp Index — no GAAP-equivalent for AV safety reporting — and the absence of one makes it nearly impossible to answer the most important question in the industry: who is actually winning the AV race, and how fast?

Section 1 — Why Current AV Metrics Are Incomparable

The metrics each company publishes were chosen, at least in part, because they make that company look good. This is not a conspiracy — it is the rational behavior of companies in a competitive industry with no mandatory reporting standard. The result is a landscape where no two companies report the same thing.

Metric	Who uses it	The problem
Miles between disengagement	California DMV reports (all permitted AV operators)	Definition varies — Waymo counts “manual takeovers to prevent incident”; Tesla counts “driver intervention” — not equivalent
Critical disengagement rate per 1K miles	Tesla (FSD reporting)	Tesla-defined “critical” — no external audit; includes supervised driving where a human is expected to be present
Miles per intervention	Aurora, Waymo (internal)	“Intervention” defined differently across companies; what counts as an intervention varies by protocol
Incidents per million miles	NHTSA Standing General Order (SGER)	Best-standardized external metric; but requires 30-day delay; incident definition still has grey areas
Crash rate per 100M VMT	NHTSA/FHWA (human benchmark)	Human benchmark: approximately 1.35 crashes per million VMT (est.); hard to compare to AV operational mix
Rider comfort score	Internal (no public reporting)	No external standard; important for commercial viability but invisible to outside observers
Availability / uptime	Internal	What percentage of rides complete without tech takeover? Not publicly reported by any company

The core problem: AV companies are incentivized to choose metrics where they look best. There is no GAAP-equivalent for AV safety reporting. A company with a genuinely poor safety record can choose to report a metric that obscures it — and the absence of a mandatory external audit standard means there is no way to verify the numbers that are published.

Section 2 — What a Real Physical AI Ramp Index Should Measure

A credible index needs metrics across four dimensions: Safety, Scale, Capability Growth, and Commercial Viability. Each dimension captures something the others cannot.

Dimension A — Safety

Metric	Definition	Why it matters
Incidents per million driverless miles (NHTSA-reportable)	Crashes and collisions requiring NHTSA SGER report, normalized per million autonomous miles	Most externally auditable; NHTSA receives these within 30 days
At-fault incidents per million miles	Subset of above where investigation finds AV was at fault	Distinguishes AV error from human-caused incidents in mixed traffic
Safety-critical disengagements per 100K miles	System-initiated handoffs to human due to safety concern (not human preference)	Separates safety-critical from comfort disengagements

Dimension B — Scale

Metric	Definition	Why it matters
Total driverless miles accumulated (cumulative)	Miles without safety driver present	Scale of driverless exposure — the denominator that gives safety stats meaning
Weekly driverless rides (paid commercial)	Distinct from test rides; revenue-generating only	Commercial traction; requires public riders willing to pay
Active driverless fleet size	Vehicles operating without safety driver	Hardware at scale indicator
Geographic coverage (driverless ODD sq miles)	Total square miles of driverless operational design domain	Breadth vs depth tradeoff visible

Dimension C — Capability Growth

Metric	Definition	Why it matters
ODD expansion rate (cities per year)	New cities added to driverless commercial service per year	Speed of geographic expansion
Weather condition coverage	Percentage of weather events handled driverlessly (rain, fog, snow)	Capability depth in harsh conditions
Night and 24-hour operational percentage	Percentage of service hours operated at night	Proves system is not daytime-only

Dimension D — Commercial Viability

Metric	Definition	Why it matters
Revenue per driverless mile	Commercial ride revenue divided by miles (no driver cost allocated)	Unit economic health
Rides per vehicle per day	Fleet utilization	Operational efficiency
Wait time (median pickup)	Time from request to vehicle arrival	Consumer experience quality
Trip completion rate	Percentage of accepted rides completed without human takeover	Reliability from rider perspective

Section 3 — Where the Leaders Stand (Mid-2026 Estimates)

Metric	Waymo	Tesla Robotaxi	Aurora	Notes
Driverless miles accumulated (cumulative)	30M+ (est.)	Under 1M (est. — Austin launch phase)	5M (est. — freight)	Waymo lead measured in years of commercial operation
Weekly driverless rides (paid)	150,000+	Hundreds to low thousands (est.)	N/A (freight)	Waymo 100x+ ahead on rides
Active driverless fleet	1,000–1,500 (est.)	10–50 (est. — Austin geofenced)	Dozens (est. — I-45)	Different scale of deployment
NHTSA-reportable incidents per million miles	Not publicly broken out by mode	Not broken out	Not broken out	Best available external data: Waymo has reported sub-1 per million (est.) in some disclosures
Commercial cities (driverless)	4 (SF, Phoenix, LA, Austin)	1 (Austin, limited zone)	1 corridor (I-45 freight)	Waymo geographic breadth largest
24/7 operational	Yes (SF, Phoenix)	No (current)	Partial (highway schedule)	Waymo proven 24/7 capability
ODD coverage (approx.)	Hundreds of sq miles across 4 cities (est.)	10–20 sq miles (est. Austin geofence)	240-mile corridor	Very different ODD types
Trip completion rate	99%+ (est.)	Not yet at scale to report	Not applicable (freight)	Waymo high completion rate key commercial differentiator

Section 4 — The Tesla Data Problem (and Advantage)

Tesla occupies a unique and genuinely anomalous position in any Physical AI Index. The company has the largest real-world driving dataset in the world — and essentially no driverless data yet.

What Tesla reports publicly:

FSD cumulative supervised miles: approximately 5–6 billion (est.)
Disengagement rate: defined as critical intervention per 1,000 miles, supervised
Autopilot and FSD incident data via NHTSA SGER reports

What is not yet reportable:

Driverless miles: the Austin robotaxi operation is very early stage — weeks into commercial operation as of mid-2026
Ride count: too small to be statistically meaningful yet
At-fault incident rate for driverless mode: no data exists

Tesla’s structural advantage that no index currently captures:

The FSD supervised fleet of 6M+ vehicles is the world’s largest real-world proof-of-concept for end-to-end neural net driving. Supervised miles are not equivalent to driverless miles, but the safety profile of supervised FSD on public roads — with a driver monitoring and able to intervene — is a genuine capability signal. The disengagement rate trend, improving approximately 2x per year (est.), is the leading indicator of when supervised becomes driverless. Any index that only counts driverless miles today will systematically underrate Tesla’s trajectory.

The deeper question any index must answer: is Tesla improving fast enough to close a 30M-mile gap in driverless experience before Waymo’s commercial moat becomes unassailable? The answer is not yet knowable from public data.

Section 5 — Proposed Physical AI Ramp Index Scoreboard

Based on publicly available and estimated data as of mid-2026:

Dimension	Waymo score	Tesla score	Why
Safety (driverless ops)	●●●●○	●○○○○	Waymo has millions of driverless miles; Tesla robotaxi is weeks old
Scale (driverless)	●●●●○	●○○○○	150K rides per week vs hundreds; 4 cities vs 1 zone
Capability growth rate	●●●○○	●●●●○	Tesla FSD improving faster; Waymo steady but slower expansion
Commercial viability	●●●○○	●●○○○	Waymo revenue-generating; unit economics still improving; Tesla too early
Humanoid and broader robotics	○○○○○	●●●○○	Waymo is AV only; Tesla has Optimus ramp
Fleet data advantage	●●○○○	●●●●●	Tesla 6M supervised fleet; Waymo 1,000–1,500 driverless
Overall Physical AI Ramp	●●●●○	●●●○○	Waymo ahead on AV today; Tesla stronger trajectory and broader scope

Key takeaway: Waymo is winning the AV race today by every measurable driverless metric. Tesla is winning the data and trajectory race, and is the only company with a credible path to both L4 AV and humanoid robotics at scale. A one-dimensional index that only counts driverless miles today misses the full picture — and systematically favors the incumbent over the challenger. A complete Physical AI Ramp Index must capture both dimensions.

Section 6 — About This Series

This is article 80 in the Physical AI Benchmark Series. Previous articles have covered the ramp index, the humanoid race, unit economics, global competition, HD mapping, software and OTA updates, consumer demand, competitive moats, safety data, Waymo Gen 6, Optimus manufacturing, scorecard snapshots, 2030 forecast scenarios, the investor framework, city expansion pipelines, Tesla FSD state approval maps, AV weather and climate constraints, the talent war, regulatory calendars, robotaxi fare pricing, humanoid deployment trackers, supply chain analysis, consumer adoption demand index, valuation and IPO analysis, the Physical AI 2026 mid-year roundup, AV unit economics cost-per-mile breakdown, the AV data flywheel comparison, AV cybersecurity attack surfaces, the Physical AI supply chain, AV fleet operations, AV insurance and liability evolution, the full lifecycle environmental cost, the accessibility layer, the mapping architecture comparison, the China AV race, simulation and synthetic data training, the Physical AI investment landscape, AV urban planning city impact, autonomous trucking freight economics, the European AV competitive landscape, and the AV sensor technology debate.

This article adds the metrics layer: why current AV safety metrics are incompatible, what a real Physical AI Ramp Index should measure across four dimensions, where the leading companies stand on each dimension as of mid-2026 (est.), and why any single-dimensional index fails to capture the true competitive landscape.

Note: Statistics, fleet sizes, mile counts, and scores are labeled “(est.)” where based on industry estimates and public reporting. This article does not constitute investment advice.