2026-06-18 — views
AV Safety Metrics — Why There Is No Standard Way to Compare Tesla and Waymo
AV companies report safety using incompatible metrics. Here is what a real Physical AI Ramp Index should measure, and where the leaders stand.
Article 80 in the Physical AI Benchmark Series — AV Safety Metrics: Why There Is No Standard Way to Compare Tesla and Waymo
One of the fundamental problems with tracking the autonomous vehicle ramp is that Tesla, Waymo, Aurora, and every other AV company reports safety and performance using incompatible metrics. Waymo talks about “miles between serious incidents.” Tesla reports “FSD critical disengagement rate per 1,000 miles.” Aurora cites “miles per intervention.” California DMV publishes disengagement reports that companies have learned to game. There is no standardized Physical AI Ramp Index — no GAAP-equivalent for AV safety reporting — and the absence of one makes it nearly impossible to answer the most important question in the industry: who is actually winning the AV race, and how fast?
Section 1 — Why Current AV Metrics Are Incomparable
The metrics each company publishes were chosen, at least in part, because they make that company look good. This is not a conspiracy — it is the rational behavior of companies in a competitive industry with no mandatory reporting standard. The result is a landscape where no two companies report the same thing.
| Metric | Who uses it | The problem |
|---|---|---|
| Miles between disengagement | California DMV reports (all permitted AV operators) | Definition varies — Waymo counts “manual takeovers to prevent incident”; Tesla counts “driver intervention” — not equivalent |
| Critical disengagement rate per 1K miles | Tesla (FSD reporting) | Tesla-defined “critical” — no external audit; includes supervised driving where a human is expected to be present |
| Miles per intervention | Aurora, Waymo (internal) | “Intervention” defined differently across companies; what counts as an intervention varies by protocol |
| Incidents per million miles | NHTSA Standing General Order (SGER) | Best-standardized external metric; but requires 30-day delay; incident definition still has grey areas |
| Crash rate per 100M VMT | NHTSA/FHWA (human benchmark) | Human benchmark: approximately 1.35 crashes per million VMT (est.); hard to compare to AV operational mix |
| Rider comfort score | Internal (no public reporting) | No external standard; important for commercial viability but invisible to outside observers |
| Availability / uptime | Internal | What percentage of rides complete without tech takeover? Not publicly reported by any company |
The core problem: AV companies are incentivized to choose metrics where they look best. There is no GAAP-equivalent for AV safety reporting. A company with a genuinely poor safety record can choose to report a metric that obscures it — and the absence of a mandatory external audit standard means there is no way to verify the numbers that are published.
Section 2 — What a Real Physical AI Ramp Index Should Measure
A credible index needs metrics across four dimensions: Safety, Scale, Capability Growth, and Commercial Viability. Each dimension captures something the others cannot.
Dimension A — Safety
| Metric | Definition | Why it matters |
|---|---|---|
| Incidents per million driverless miles (NHTSA-reportable) | Crashes and collisions requiring NHTSA SGER report, normalized per million autonomous miles | Most externally auditable; NHTSA receives these within 30 days |
| At-fault incidents per million miles | Subset of above where investigation finds AV was at fault | Distinguishes AV error from human-caused incidents in mixed traffic |
| Safety-critical disengagements per 100K miles | System-initiated handoffs to human due to safety concern (not human preference) | Separates safety-critical from comfort disengagements |
Dimension B — Scale
| Metric | Definition | Why it matters |
|---|---|---|
| Total driverless miles accumulated (cumulative) | Miles without safety driver present | Scale of driverless exposure — the denominator that gives safety stats meaning |
| Weekly driverless rides (paid commercial) | Distinct from test rides; revenue-generating only | Commercial traction; requires public riders willing to pay |
| Active driverless fleet size | Vehicles operating without safety driver | Hardware at scale indicator |
| Geographic coverage (driverless ODD sq miles) | Total square miles of driverless operational design domain | Breadth vs depth tradeoff visible |
Dimension C — Capability Growth
| Metric | Definition | Why it matters |
|---|---|---|
| ODD expansion rate (cities per year) | New cities added to driverless commercial service per year | Speed of geographic expansion |
| Weather condition coverage | Percentage of weather events handled driverlessly (rain, fog, snow) | Capability depth in harsh conditions |
| Night and 24-hour operational percentage | Percentage of service hours operated at night | Proves system is not daytime-only |
Dimension D — Commercial Viability
| Metric | Definition | Why it matters |
|---|---|---|
| Revenue per driverless mile | Commercial ride revenue divided by miles (no driver cost allocated) | Unit economic health |
| Rides per vehicle per day | Fleet utilization | Operational efficiency |
| Wait time (median pickup) | Time from request to vehicle arrival | Consumer experience quality |
| Trip completion rate | Percentage of accepted rides completed without human takeover | Reliability from rider perspective |
Section 3 — Where the Leaders Stand (Mid-2026 Estimates)
| Metric | Waymo | Tesla Robotaxi | Aurora | Notes |
|---|---|---|---|---|
| Driverless miles accumulated (cumulative) | 30M+ (est.) | Under 1M (est. — Austin launch phase) | 5M (est. — freight) | Waymo lead measured in years of commercial operation |
| Weekly driverless rides (paid) | 150,000+ | Hundreds to low thousands (est.) | N/A (freight) | Waymo 100x+ ahead on rides |
| Active driverless fleet | 1,000–1,500 (est.) | 10–50 (est. — Austin geofenced) | Dozens (est. — I-45) | Different scale of deployment |
| NHTSA-reportable incidents per million miles | Not publicly broken out by mode | Not broken out | Not broken out | Best available external data: Waymo has reported sub-1 per million (est.) in some disclosures |
| Commercial cities (driverless) | 4 (SF, Phoenix, LA, Austin) | 1 (Austin, limited zone) | 1 corridor (I-45 freight) | Waymo geographic breadth largest |
| 24/7 operational | Yes (SF, Phoenix) | No (current) | Partial (highway schedule) | Waymo proven 24/7 capability |
| ODD coverage (approx.) | Hundreds of sq miles across 4 cities (est.) | 10–20 sq miles (est. Austin geofence) | 240-mile corridor | Very different ODD types |
| Trip completion rate | 99%+ (est.) | Not yet at scale to report | Not applicable (freight) | Waymo high completion rate key commercial differentiator |
Section 4 — The Tesla Data Problem (and Advantage)
Tesla occupies a unique and genuinely anomalous position in any Physical AI Index. The company has the largest real-world driving dataset in the world — and essentially no driverless data yet.
What Tesla reports publicly:
- FSD cumulative supervised miles: approximately 5–6 billion (est.)
- Disengagement rate: defined as critical intervention per 1,000 miles, supervised
- Autopilot and FSD incident data via NHTSA SGER reports
What is not yet reportable:
- Driverless miles: the Austin robotaxi operation is very early stage — weeks into commercial operation as of mid-2026
- Ride count: too small to be statistically meaningful yet
- At-fault incident rate for driverless mode: no data exists
Tesla’s structural advantage that no index currently captures:
The FSD supervised fleet of 6M+ vehicles is the world’s largest real-world proof-of-concept for end-to-end neural net driving. Supervised miles are not equivalent to driverless miles, but the safety profile of supervised FSD on public roads — with a driver monitoring and able to intervene — is a genuine capability signal. The disengagement rate trend, improving approximately 2x per year (est.), is the leading indicator of when supervised becomes driverless. Any index that only counts driverless miles today will systematically underrate Tesla’s trajectory.
The deeper question any index must answer: is Tesla improving fast enough to close a 30M-mile gap in driverless experience before Waymo’s commercial moat becomes unassailable? The answer is not yet knowable from public data.
Section 5 — Proposed Physical AI Ramp Index Scoreboard
Based on publicly available and estimated data as of mid-2026:
| Dimension | Waymo score | Tesla score | Why |
|---|---|---|---|
| Safety (driverless ops) | ●●●●○ | ●○○○○ | Waymo has millions of driverless miles; Tesla robotaxi is weeks old |
| Scale (driverless) | ●●●●○ | ●○○○○ | 150K rides per week vs hundreds; 4 cities vs 1 zone |
| Capability growth rate | ●●●○○ | ●●●●○ | Tesla FSD improving faster; Waymo steady but slower expansion |
| Commercial viability | ●●●○○ | ●●○○○ | Waymo revenue-generating; unit economics still improving; Tesla too early |
| Humanoid and broader robotics | ○○○○○ | ●●●○○ | Waymo is AV only; Tesla has Optimus ramp |
| Fleet data advantage | ●●○○○ | ●●●●● | Tesla 6M supervised fleet; Waymo 1,000–1,500 driverless |
| Overall Physical AI Ramp | ●●●●○ | ●●●○○ | Waymo ahead on AV today; Tesla stronger trajectory and broader scope |
Key takeaway: Waymo is winning the AV race today by every measurable driverless metric. Tesla is winning the data and trajectory race, and is the only company with a credible path to both L4 AV and humanoid robotics at scale. A one-dimensional index that only counts driverless miles today misses the full picture — and systematically favors the incumbent over the challenger. A complete Physical AI Ramp Index must capture both dimensions.
Section 6 — About This Series
This is article 80 in the Physical AI Benchmark Series. Previous articles have covered the ramp index, the humanoid race, unit economics, global competition, HD mapping, software and OTA updates, consumer demand, competitive moats, safety data, Waymo Gen 6, Optimus manufacturing, scorecard snapshots, 2030 forecast scenarios, the investor framework, city expansion pipelines, Tesla FSD state approval maps, AV weather and climate constraints, the talent war, regulatory calendars, robotaxi fare pricing, humanoid deployment trackers, supply chain analysis, consumer adoption demand index, valuation and IPO analysis, the Physical AI 2026 mid-year roundup, AV unit economics cost-per-mile breakdown, the AV data flywheel comparison, AV cybersecurity attack surfaces, the Physical AI supply chain, AV fleet operations, AV insurance and liability evolution, the full lifecycle environmental cost, the accessibility layer, the mapping architecture comparison, the China AV race, simulation and synthetic data training, the Physical AI investment landscape, AV urban planning city impact, autonomous trucking freight economics, the European AV competitive landscape, and the AV sensor technology debate.
This article adds the metrics layer: why current AV safety metrics are incompatible, what a real Physical AI Ramp Index should measure across four dimensions, where the leading companies stand on each dimension as of mid-2026 (est.), and why any single-dimensional index fails to capture the true competitive landscape.
Note: Statistics, fleet sizes, mile counts, and scores are labeled “(est.)” where based on industry estimates and public reporting. This article does not constitute investment advice.
Sources
- NHTSA Standing General Order AV incident reports ↗
- California DMV autonomous vehicle disengagement reports ↗
- Waymo safety report — Waymo ↗
- Tesla vehicle safety report — Tesla ↗
- Aurora safety case — Aurora ↗