2026-06-18 — views

Physical AI Safety Record — AV Crash Rates vs Human Drivers and the Ultimate Benchmark

AV crash rates vs human drivers: what California DMV, NHTSA data, and Waymo safety reports reveal about the ultimate Physical AI benchmark.

Article 120 in the Physical AI Benchmark Series — Physical AI Safety Record: AV Crash Rates vs Human Drivers, What California DMV and NHTSA Data Show, and Why Safety Statistics Are the Ultimate Physical AI Benchmark

Every commercial AV argument — unit economics, regulatory approvals, rider NPS, investor valuations — ultimately rests on one foundational question: is the autonomous vehicle safer than a human driver? This is not a secondary consideration or a regulatory checkbox. It is the load-bearing premise of the entire industry. If AVs are not demonstrably safer than human drivers, the regulatory approvals narrow, the insurance economics worsen, the public trust collapses, and the investment thesis unravels. If they are demonstrably safer, every other variable in the commercial model improves simultaneously. Safety statistics are not one benchmark among many — they are the benchmark that all others depend on.

This article builds a comparative safety statistics framework using publicly available data: California DMV annual disengagement reports, Waymo’s own safety reports, NHTSA crash data, and the Tesla quarterly safety reports filed with NHTSA. The goal is to understand where AVs stand today relative to the human driver baseline, what the trend line suggests, and why the safety benchmark is structurally more complex than a simple miles-per-crash comparison.

Section 1 — The Human Driver Baseline

Before evaluating AV safety data, it is essential to establish the baseline that AVs must beat. The human driver baseline in the United States is drawn from NHTSA’s annual traffic safety facts database, which covers all road types, all conditions, all driver ages, and all impairment states. This is the full population of US driving — not a curated subset of ideal conditions.

Safety metric	Human driver US average	Notes / source
Fatal crashes per 100M miles	~1.37 fatalities per 100M miles (NHTSA 2022 data)	US national average; includes all road types, all conditions, all driver ages and impairment states
Injury crashes per 100M miles	~77 injury crashes per 100M miles (NHTSA est.)	Injury = any crash requiring medical attention
All crashes per 100M miles	~200-250 crashes per 100M miles (NHTSA est., including minor)	Property damage plus injury plus fatal combined
Impaired driving contribution	~37% of all traffic fatalities involve alcohol-impaired drivers (NHTSA)	AV eliminates this category entirely
Distracted driving contribution	~8-9% of fatal crashes involve distracted driving (NHTSA)	AV eliminates this category
Fatigue contribution	~2-3% of fatal crashes involve drowsy driving (NHTSA; likely underreported)	AV eliminates this category
Human error overall	~94% of serious crashes involve human error as a contributing factor (NHTSA estimate)	AV’s primary safety thesis: eliminate human error
Young driver risk	Drivers 16-19 have crash rates 3x higher than drivers 20+ (NHTSA)	AV provides consistent performance regardless of “age”

The 94% human error figure from NHTSA is the central justification for the AV safety thesis. If nearly all serious crashes trace back to human decision-making — impairment, distraction, fatigue, misjudgment, excessive speed — then a system that removes the human from the control loop should, in theory, eliminate most crashes. The word “in theory” carries significant weight: AVs must eliminate human error without introducing new failure modes of their own (sensor failures, software edge cases, adversarial weather conditions, unusual road configurations). The safety benchmark is therefore not “safer than the average human” but “safer than the average human without introducing new categories of failure.”

The human driver baseline also reveals an underappreciated asymmetry: human driving performance is highly variable. The gap between a sober, alert, experienced adult driver on a dry highway in daylight and an impaired teenager on a wet road at 2 AM is enormous. An AV’s performance, by contrast, is far more consistent — the same sensor suite, software stack, and decision algorithms operate in every condition the vehicle encounters (within the design domain). This consistency advantage compounds at scale: a million AV trips carry far less performance variance than a million human trips.

Section 2 — Waymo Safety Data (Disclosed)

Waymo is the only fully driverless commercial AV operator with sufficient miles and disclosed safety data to permit a statistically meaningful comparison to the human driver baseline. As of early 2026, Waymo has accumulated 50M+ driverless commercial miles — a sample size large enough for fatality and injury rate comparisons.

Metric	Waymo reported data	Comparison to human baseline	Confidence
Waymo One driverless miles (cumulative)	50M+ driverless commercial miles as of early 2026 (reported by Waymo)	Sufficient sample size for statistical comparison	Disclosed by Waymo
Injury-causing crashes per 100M miles (Waymo driverless)	Waymo’s 2023 safety report: significantly below human baseline across comparable urban miles	Human urban baseline: ~76 injury crashes per 100M miles (est.); Waymo well below this (reported)	Waymo safety report
Airbag-deploying / severe crash rate	Waymo reported 0 airbag-deploying crashes in driverless mode across millions of miles in the 2023 report period	Human baseline for airbag-deploying crashes: ~4-5 per 100M miles (est.)	Waymo safety report 2023
Fault in reported incidents	Of crashes reported to CA DMV, the majority have been caused by the other driver (rear-ending a stopped Waymo, running a red light into Waymo’s path)	Waymo vehicles operate conservatively; more likely to be hit than to hit	CA DMV incident reports
Disengagement rate (CA DMV report)	Waymo reports among the lowest disengagement rates of any AV company reporting to CA DMV; 2022-2023 reports show continuous improvement	Disengagement rate is a proxy for system confidence (not a direct safety metric)	CA DMV annual AV report
Police-reportable crash rate	Waymo has publicly stated its police-reportable crash rate is below the human baseline for comparable urban driving (reported)	Urban US crash rate: ~200 per 100M miles (all severity, est.)	Waymo blog disclosures
Pedestrian / cyclist incidents	No pedestrian fatalities in driverless Waymo One service as of mid-2026 (est.)	Human drivers cause ~7,500 pedestrian fatalities per year US-wide (NHTSA)	Est.; Waymo has not disclosed a fatality

The 0 airbag-deploying crashes figure from Waymo’s 2023 safety report is particularly significant. Airbag deployment is a proxy for high-severity impacts — crashes where forces were sufficient to trigger the vehicle’s own safety systems. The human baseline for airbag-deploying crashes is approximately 4-5 per 100M miles (est.). Waymo reporting zero in driverless commercial service, across millions of urban miles, represents a safety performance gap that is difficult to attribute to sampling variance alone.

The fault-attribution pattern in California DMV incident reports is a second important signal. When Waymo vehicles are involved in collisions, the majority of incidents involve the other party as the at-fault driver: another vehicle rear-ending a stationary or slow-moving Waymo, or a driver running a red light into Waymo’s path. This pattern reflects Waymo’s conservative driving style — the system is tuned to prioritize yielding, stopping, and creating buffer space over assertive maneuvers. The trade-off is occasional disruption to traffic flow; the benefit is a dramatically reduced at-fault crash rate.

The disengagement rate data from California DMV annual reports provides a longitudinal view of system improvement. A disengagement is an event where the safety driver takes manual control — either because the system requests it or because the driver judges intervention necessary. While disengagement rate is not a direct safety metric (a system can disengage frequently but safely, or rarely but unsafely), the trend toward near-zero disengagement rates in Waymo’s driverless commercial operations suggests a system operating well within its design domain the vast majority of the time.

Section 3 — Tesla FSD Safety Data

Tesla’s approach to safety data disclosure differs fundamentally from Waymo’s. Rather than publishing comprehensive safety reports to a government database, Tesla files quarterly safety data with NHTSA and publishes summary statistics on its website. The key distinction is that Tesla’s FSD system — including the commercially deployed “Full Self-Driving” Supervised mode — always operates with a human safety driver present and capable of intervening. This is a supervised autonomy system, not a driverless system. Comparisons to Waymo’s driverless data require this methodological caveat.

Metric	Tesla reported data	Notes
Tesla Autopilot/FSD crash rate (NHTSA)	Tesla reports quarterly safety data to NHTSA; Q4 2023: 1 crash per 5.7M miles on Autopilot vs 1 crash per 1.5M miles without Autopilot (Tesla quarterly report)	Autopilot includes a human safety driver; not a fully driverless comparison
Comparison caveat	Autopilot and FSD supervised are not equivalent to driverless; the human is monitoring and can intervene; the safety benefit includes human override, not just AI performance	Direct comparison to Waymo driverless is methodologically complex
NHTSA investigations	Multiple NHTSA investigations into Tesla Autopilot crashes; several involving emergency vehicles (parked firetrucks, ambulances); Tesla has issued OTA updates in response	NHTSA has not determined Autopilot to be defective; investigations are ongoing process
Tesla’s safety claim	Tesla cars on Autopilot are ~4x safer than the average US driver (Tesla quarterly safety report framing)	Methodological debate: Autopilot activates on highways where crashes are rarer; comparison group matters
FSD driverless (Austin)	Tesla’s Austin robotaxi launch uses supervised FSD; safety record for unsupervised driverless operation not yet publicly established	This is the key data gap; will emerge as Austin scale grows

The methodological debate around Tesla’s “4x safer” claim is genuine and important. Tesla calculates this ratio by comparing crash rates per mile on Autopilot to crash rates per mile without Autopilot across its entire US fleet. The problem is selection bias: Autopilot is disproportionately activated on highways and controlled-access roads, which are inherently lower-crash-rate environments than urban streets or intersections. A driver engaging Autopilot on an interstate for a highway cruise is already in a lower-risk environment than a driver manually navigating city traffic. Controlling for route type and driving environment is essential for a valid comparison — and Tesla’s published statistics do not fully account for this.

The NHTSA investigations into Tesla Autopilot crashes involving emergency vehicles (stationary firetrucks, ambulances, police cars parked on roadways) reveal a specific failure mode: the system’s detection and avoidance of stationary objects in its lane, particularly those with unusual visual signatures against complex backgrounds. Tesla has addressed multiple such incidents through OTA updates, demonstrating the software-updatable advantage of the AV architecture. But the pattern of requiring regulatory investigation before addressing a known failure mode raises questions about proactive safety disclosure.

Tesla’s Austin unsupervised robotaxi service represents the emerging data frontier for FSD safety. The first commercial rides using vehicles without a safety driver will generate the dataset that makes Waymo-to-Tesla driverless comparisons possible for the first time. Until that data accumulates, the Tesla safety picture remains a supervised-autonomy story — with genuine benefits over unassisted human driving, but not yet equivalent to Waymo’s driverless track record.

Section 4 — Why Safety Statistics Are the Ultimate Physical AI Benchmark

Safety statistics occupy a unique position in the AV commercial model: they are simultaneously a regulatory gate, an investor confidence signal, an insurance pricing input, a public trust driver, and the primary validation of the core technology thesis. No other single metric has this range of downstream consequences.

Mechanism	How safety data affects the ramp	Stakes
Regulatory gating	A serious at-fault AV fatality can halt a company’s entire commercial operation pending investigation; California suspended Cruise operations after a 2023 incident	One fatal incident = months to years of regulatory review; risk is asymmetric
The Cruise incident precedent (Oct 2023)	GM Cruise vehicle struck a pedestrian already hit by another vehicle, then dragged her 20 feet; California DMV suspended Cruise’s driverless permit within weeks; Cruise never returned to commercial operation (GM shuttered the business end-2023)	Industry-defining event: proved regulators will act swiftly on safety incidents; set the standard for all AV operators
Investor confidence	Alphabet’s continued Waymo investment depends partly on clean safety record; a major at-fault incident would affect valuation and fundraising	Waymo’s $45B+ valuation implies market believes safety record will hold
Insurance actuarial data	AV insurers need statistical data to price commercial fleet coverage accurately; more miles with clean records leads to lower insurance cost (compounding economic benefit)	Virtuous cycle: safety leads to lower insurance leads to better unit economics
Public trust formation	Public trust in AV technology is fragile; a high-profile incident anywhere (even different company) damages trust for the entire category	Industry-wide externality: Cruise incident depressed public AV acceptance for 12-18 months
The 94% human error thesis	If ~94% of crashes involve human error (NHTSA), and AVs eliminate human error, the theoretical safety ceiling is massive; but AVs must also not introduce new failure modes (adversarial conditions, edge cases, sensor failures)	The safety benchmark is not “safer than average human” but “safer than best human plus no new failure modes”

The Cruise incident of October 2023 deserves particular attention because it established the regulatory risk template for the entire industry. A GM Cruise robotaxi, operating driverlessly in San Francisco, struck a pedestrian who had already been hit by another vehicle and knocked into the Cruise’s path. The Cruise vehicle then pulled over — standard post-incident behavior — and in doing so dragged the pedestrian approximately 20 feet before stopping. The Cruise vehicle’s software had not detected the pedestrian under its chassis.

California’s DMV response was swift: within weeks, Cruise’s driverless commercial permit was suspended. Cruise attempted to continue operating with safety drivers, but the combination of the DMV action, a parallel investigation revealing Cruise had not fully disclosed the incident to regulators, and the resulting loss of public and investor confidence led GM to shutter the entire Cruise autonomous vehicle business by the end of 2023. An industry leader with billions in investment, a substantial San Francisco fleet, and multi-year commercial operating experience was eliminated from the market within months of a single serious incident — one that Cruise did not even cause in the primary sense (another vehicle hit the pedestrian first).

The Cruise precedent fundamentally changes the risk calculus for all AV operators. The asymmetry is striking: a clean safety record earns incremental regulatory trust and investor confidence. A serious at-fault incident triggers existential risk to the operating license. This asymmetry explains why Waymo’s conservative driving style — the system that yields, brakes early, and avoids assertive maneuvers even at the cost of traffic disruption — is not just a software preference but a survival strategy.

Section 5 — Safety Benchmark Scorecard: Waymo vs Tesla vs Human Baseline

Synthesizing the safety data into a comparative framework reveals where each system stands across multiple safety dimensions. The comparison is necessarily imperfect because the operational contexts differ: Waymo operates in defined urban geofences under controlled conditions, Tesla FSD operates across a broad range of US road types with a human available to intervene, and the human driver baseline covers all US roads with all driver types.

Safety dimension	Human driver	Waymo (driverless)	Tesla (supervised FSD)
Impaired driving risk	Present (~37% of fatalities)	Eliminated	Eliminated
Distracted driving risk	Present (~8-9% of fatalities)	Eliminated	Reduced (human monitors)
Fatigue risk	Present	Eliminated	Reduced
Injury crash rate per 100M miles	~76 (urban est.)	Well below human baseline (reported)	Below Autopilot-off by ~4x (Tesla claim)
Airbag-deploying crash rate	~4-5 per 100M miles (est.)	~0 in driverless commercial service (reported)	Not separately disclosed for FSD
Pedestrian fatalities	~7,500 per year US total	None in Waymo One driverless (est.)	Several NHTSA-investigated incidents under Autopilot
At-fault crash pattern	Variable by driver	Majority of incidents: other party at fault (CA DMV data)	NHTSA investigations ongoing; emergency vehicle incidents
Trend	Flat / slowly improving (US)	Improving rapidly as more miles logged	Improving with OTA updates

The scorecard reveals a clear differentiation pattern. Waymo’s driverless system has the strongest safety profile on the metrics that matter most — injury crash rate, airbag-deploying crash rate, at-fault crash pattern, and pedestrian fatality record — but operates in a narrower, more controlled environment (urban geofences in a handful of US cities). Tesla’s supervised FSD shows genuine safety improvement over unassisted human driving but the human-in-the-loop constraint means that comparisons to Waymo’s driverless data are methodologically complex.

The trend lines are arguably as important as the absolute numbers. Waymo’s safety metrics improve as miles accumulate — more miles generate more training data, which improves the model, which improves safety performance. This compounding dynamic means that a comparison taken today understates Waymo’s safety advantage at 500M miles versus 50M miles. The trajectory is upward. The human driver baseline, by contrast, has been essentially flat for years — marginal improvements from lane-departure warning systems, automatic emergency braking, and other ADAS features have not materially moved the national fatality rate per mile.

Section 6 — The Path to Demonstrably Safer-Than-Human Performance

The phrase “demonstrably safer than human” is doing significant work in AV industry discourse, and it is worth unpacking precisely what demonstration requires.

The statistical challenge is that rare events — fatalities — require enormous sample sizes for confidence. The US human driver fatality rate is approximately 1.37 per 100M miles (NHTSA 2022). To demonstrate with statistical confidence that an AV system has a fatality rate below this level, you need enough miles to expect at least several fatalities if the rate matched the human baseline. At 100M miles, the expected number of fatalities at the human rate is ~1.37 — not enough statistical power to conclude the AV rate is lower rather than simply lucky. At 500M miles, the picture becomes clearer. At 1B miles, the comparison is statistically robust.

Waymo’s cumulative 50M+ driverless miles is approaching the zone where injury crash rate comparisons are statistically meaningful, but fatality rate comparisons remain limited by sample size. The absence of fatalities in driverless operation is a positive signal but not yet statistical proof of a lower fatality rate — it is consistent with both “the system is dramatically safer” and “the system is slightly safer and has not yet drawn the fatality lottery.” At 500M miles with zero fatalities, the latter interpretation becomes very unlikely. The trajectory is toward demonstrability.

The industry trajectory in mid-2026 suggests the safety demonstration will be complete within 2-4 years for Waymo’s driverless urban operations, assuming continued fleet growth and no major at-fault incidents. Tesla’s path to driverless safety demonstration is longer because the unsupervised driverless miles are just beginning to accumulate. The convergence of these two datasets — Waymo’s mature urban driverless record and Tesla’s emerging unsupervised record — will define the state of Physical AI safety benchmarking by 2028-2030.

Note: All figures labeled “(est.)” are derived from publicly available regulatory filings, company announcements, safety reports, and industry estimates as of mid-2026. Safety statistics cited from NHTSA, Waymo, and Tesla reflect their respective published data as of the dates noted. Crash rates are directional comparisons; methodology differences between company disclosures and NHTSA population data affect direct comparisons. This article does not constitute investment or legal advice.