Skip to content
AI-Daily-Builder

2026-06-18 views

AV Safety Metrics — Why There Is No Standard Way to Compare Tesla and Waymo

AV companies report safety using incompatible metrics. Here is what a real Physical AI Ramp Index should measure, and where the leaders stand.

Article 80 in the Physical AI Benchmark Series — AV Safety Metrics: Why There Is No Standard Way to Compare Tesla and Waymo

One of the fundamental problems with tracking the autonomous vehicle ramp is that Tesla, Waymo, Aurora, and every other AV company reports safety and performance using incompatible metrics. Waymo talks about “miles between serious incidents.” Tesla reports “FSD critical disengagement rate per 1,000 miles.” Aurora cites “miles per intervention.” California DMV publishes disengagement reports that companies have learned to game. There is no standardized Physical AI Ramp Index — no GAAP-equivalent for AV safety reporting — and the absence of one makes it nearly impossible to answer the most important question in the industry: who is actually winning the AV race, and how fast?


Section 1 — Why Current AV Metrics Are Incomparable

The metrics each company publishes were chosen, at least in part, because they make that company look good. This is not a conspiracy — it is the rational behavior of companies in a competitive industry with no mandatory reporting standard. The result is a landscape where no two companies report the same thing.

MetricWho uses itThe problem
Miles between disengagementCalifornia DMV reports (all permitted AV operators)Definition varies — Waymo counts “manual takeovers to prevent incident”; Tesla counts “driver intervention” — not equivalent
Critical disengagement rate per 1K milesTesla (FSD reporting)Tesla-defined “critical” — no external audit; includes supervised driving where a human is expected to be present
Miles per interventionAurora, Waymo (internal)“Intervention” defined differently across companies; what counts as an intervention varies by protocol
Incidents per million milesNHTSA Standing General Order (SGER)Best-standardized external metric; but requires 30-day delay; incident definition still has grey areas
Crash rate per 100M VMTNHTSA/FHWA (human benchmark)Human benchmark: approximately 1.35 crashes per million VMT (est.); hard to compare to AV operational mix
Rider comfort scoreInternal (no public reporting)No external standard; important for commercial viability but invisible to outside observers
Availability / uptimeInternalWhat percentage of rides complete without tech takeover? Not publicly reported by any company

The core problem: AV companies are incentivized to choose metrics where they look best. There is no GAAP-equivalent for AV safety reporting. A company with a genuinely poor safety record can choose to report a metric that obscures it — and the absence of a mandatory external audit standard means there is no way to verify the numbers that are published.


Section 2 — What a Real Physical AI Ramp Index Should Measure

A credible index needs metrics across four dimensions: Safety, Scale, Capability Growth, and Commercial Viability. Each dimension captures something the others cannot.

Dimension A — Safety

MetricDefinitionWhy it matters
Incidents per million driverless miles (NHTSA-reportable)Crashes and collisions requiring NHTSA SGER report, normalized per million autonomous milesMost externally auditable; NHTSA receives these within 30 days
At-fault incidents per million milesSubset of above where investigation finds AV was at faultDistinguishes AV error from human-caused incidents in mixed traffic
Safety-critical disengagements per 100K milesSystem-initiated handoffs to human due to safety concern (not human preference)Separates safety-critical from comfort disengagements

Dimension B — Scale

MetricDefinitionWhy it matters
Total driverless miles accumulated (cumulative)Miles without safety driver presentScale of driverless exposure — the denominator that gives safety stats meaning
Weekly driverless rides (paid commercial)Distinct from test rides; revenue-generating onlyCommercial traction; requires public riders willing to pay
Active driverless fleet sizeVehicles operating without safety driverHardware at scale indicator
Geographic coverage (driverless ODD sq miles)Total square miles of driverless operational design domainBreadth vs depth tradeoff visible

Dimension C — Capability Growth

MetricDefinitionWhy it matters
ODD expansion rate (cities per year)New cities added to driverless commercial service per yearSpeed of geographic expansion
Weather condition coveragePercentage of weather events handled driverlessly (rain, fog, snow)Capability depth in harsh conditions
Night and 24-hour operational percentagePercentage of service hours operated at nightProves system is not daytime-only

Dimension D — Commercial Viability

MetricDefinitionWhy it matters
Revenue per driverless mileCommercial ride revenue divided by miles (no driver cost allocated)Unit economic health
Rides per vehicle per dayFleet utilizationOperational efficiency
Wait time (median pickup)Time from request to vehicle arrivalConsumer experience quality
Trip completion ratePercentage of accepted rides completed without human takeoverReliability from rider perspective

Section 3 — Where the Leaders Stand (Mid-2026 Estimates)

MetricWaymoTesla RobotaxiAuroraNotes
Driverless miles accumulated (cumulative)30M+ (est.)Under 1M (est. — Austin launch phase)5M (est. — freight)Waymo lead measured in years of commercial operation
Weekly driverless rides (paid)150,000+Hundreds to low thousands (est.)N/A (freight)Waymo 100x+ ahead on rides
Active driverless fleet1,000–1,500 (est.)10–50 (est. — Austin geofenced)Dozens (est. — I-45)Different scale of deployment
NHTSA-reportable incidents per million milesNot publicly broken out by modeNot broken outNot broken outBest available external data: Waymo has reported sub-1 per million (est.) in some disclosures
Commercial cities (driverless)4 (SF, Phoenix, LA, Austin)1 (Austin, limited zone)1 corridor (I-45 freight)Waymo geographic breadth largest
24/7 operationalYes (SF, Phoenix)No (current)Partial (highway schedule)Waymo proven 24/7 capability
ODD coverage (approx.)Hundreds of sq miles across 4 cities (est.)10–20 sq miles (est. Austin geofence)240-mile corridorVery different ODD types
Trip completion rate99%+ (est.)Not yet at scale to reportNot applicable (freight)Waymo high completion rate key commercial differentiator

Section 4 — The Tesla Data Problem (and Advantage)

Tesla occupies a unique and genuinely anomalous position in any Physical AI Index. The company has the largest real-world driving dataset in the world — and essentially no driverless data yet.

What Tesla reports publicly:

What is not yet reportable:

Tesla’s structural advantage that no index currently captures:

The FSD supervised fleet of 6M+ vehicles is the world’s largest real-world proof-of-concept for end-to-end neural net driving. Supervised miles are not equivalent to driverless miles, but the safety profile of supervised FSD on public roads — with a driver monitoring and able to intervene — is a genuine capability signal. The disengagement rate trend, improving approximately 2x per year (est.), is the leading indicator of when supervised becomes driverless. Any index that only counts driverless miles today will systematically underrate Tesla’s trajectory.

The deeper question any index must answer: is Tesla improving fast enough to close a 30M-mile gap in driverless experience before Waymo’s commercial moat becomes unassailable? The answer is not yet knowable from public data.


Section 5 — Proposed Physical AI Ramp Index Scoreboard

Based on publicly available and estimated data as of mid-2026:

DimensionWaymo scoreTesla scoreWhy
Safety (driverless ops)●●●●○●○○○○Waymo has millions of driverless miles; Tesla robotaxi is weeks old
Scale (driverless)●●●●○●○○○○150K rides per week vs hundreds; 4 cities vs 1 zone
Capability growth rate●●●○○●●●●○Tesla FSD improving faster; Waymo steady but slower expansion
Commercial viability●●●○○●●○○○Waymo revenue-generating; unit economics still improving; Tesla too early
Humanoid and broader robotics○○○○○●●●○○Waymo is AV only; Tesla has Optimus ramp
Fleet data advantage●●○○○●●●●●Tesla 6M supervised fleet; Waymo 1,000–1,500 driverless
Overall Physical AI Ramp●●●●○●●●○○Waymo ahead on AV today; Tesla stronger trajectory and broader scope

Key takeaway: Waymo is winning the AV race today by every measurable driverless metric. Tesla is winning the data and trajectory race, and is the only company with a credible path to both L4 AV and humanoid robotics at scale. A one-dimensional index that only counts driverless miles today misses the full picture — and systematically favors the incumbent over the challenger. A complete Physical AI Ramp Index must capture both dimensions.


Section 6 — About This Series

This is article 80 in the Physical AI Benchmark Series. Previous articles have covered the ramp index, the humanoid race, unit economics, global competition, HD mapping, software and OTA updates, consumer demand, competitive moats, safety data, Waymo Gen 6, Optimus manufacturing, scorecard snapshots, 2030 forecast scenarios, the investor framework, city expansion pipelines, Tesla FSD state approval maps, AV weather and climate constraints, the talent war, regulatory calendars, robotaxi fare pricing, humanoid deployment trackers, supply chain analysis, consumer adoption demand index, valuation and IPO analysis, the Physical AI 2026 mid-year roundup, AV unit economics cost-per-mile breakdown, the AV data flywheel comparison, AV cybersecurity attack surfaces, the Physical AI supply chain, AV fleet operations, AV insurance and liability evolution, the full lifecycle environmental cost, the accessibility layer, the mapping architecture comparison, the China AV race, simulation and synthetic data training, the Physical AI investment landscape, AV urban planning city impact, autonomous trucking freight economics, the European AV competitive landscape, and the AV sensor technology debate.

This article adds the metrics layer: why current AV safety metrics are incompatible, what a real Physical AI Ramp Index should measure across four dimensions, where the leading companies stand on each dimension as of mid-2026 (est.), and why any single-dimensional index fails to capture the true competitive landscape.

Note: Statistics, fleet sizes, mile counts, and scores are labeled “(est.)” where based on industry estimates and public reporting. This article does not constitute investment advice.


Sources

Tags

Tip