2026-06-18 — views

Physical AI VRU Safety 2026 — Waymo Multi-Sensor Detection vs Tesla FSD Camera-Only Night Performance: The AV Safety Benchmark

Waymo LIDAR detects pedestrians at night as well as noon. Tesla FSD camera-only uses headlights and neural nets. Night VRU safety is the key battleground.

Article 197 in the Physical AI Benchmark Series — Vulnerable Road User Safety

Pedestrians, cyclists, and other vulnerable road users (VRUs) are the highest-stakes safety problem in autonomous vehicle design. A 2-ton autonomous vehicle colliding with a pedestrian at 25 mph produces near-certain severe or fatal consequences for the VRU. The architectural question at the heart of this benchmark is therefore not merely a performance question — it is a safety architecture question: does a multi-sensor approach with active LIDAR illumination provide structural safety advantages over a camera-only neural network approach in the exact conditions — darkness, rain, occlusion — where VRU collisions are most likely to occur?

Waymo’s multi-sensor fusion stack (LIDAR + radar + camera) and Tesla FSD’s camera-only neural network represent the two dominant AV design philosophies at commercial scale in 2026. This benchmark systematically compares their VRU detection capabilities across five dimensions: VRU category risk, multi-sensor vs camera-only detection, night performance (the critical battleground), cyclist prediction, and an overall safety scorecard.

Section 1 — Why VRU Safety Is the Highest-Stakes AV Challenge

VRUs — pedestrians, cyclists, motorcyclists, children, wheelchair users, e-scooter and e-bike riders — occupy the most vulnerable position in the traffic ecosystem. Unlike vehicle-to-vehicle collisions where crumple zones, seatbelts, and airbags distribute crash energy across two protected occupant compartments, a VRU collision exposes a human body to the full kinetic energy of a 2-ton vehicle with no protective structure of their own. At 25 mph, a pedestrian impact carries approximately 4 times the kinetic energy of a 10 mph impact — a difference that determines whether an injury is moderate or fatal.

AV systems must detect VRUs across a challenging matrix of conditions:

Low ambient light: pedestrian fatalities in human-driven traffic peak in late evening and nighttime hours (est. 75% of US pedestrian fatalities occur in dark conditions, per NHTSA traffic safety data); any AV system must match or exceed human-driver VRU detection in precisely the conditions humans perform worst
Adverse weather: rain and fog reduce camera visibility significantly; a VRU in dark clothing in fog at night is a hard camera detection problem
Long range: at 45 mph (city highway speed), safe braking requires est. 50–70 meters of detection range; at 25 mph in a residential zone, detection range requirements are lower but reaction time is still critical
Partial occlusion: a child stepping from behind a parked car gives est. 0.5–1.5 seconds of reaction time from first detection to complete stop — less than most human driver reaction times; an AV system that can slow proactively at known occlusion points before any VRU is detected has a structural safety advantage
Unpredictable behavior: children chasing balls into the street, intoxicated pedestrians reversing direction mid-crosswalk, cyclists swerving suddenly — AV systems must predict and yield to VRU behavior that does not follow normal traffic patterns

VRU categories and their specific detection challenges:

VRU category	Detection challenge	Speed range	Most dangerous scenario
Pedestrians (adults)	Upright bipedal silhouette is the clearest VRU category for detection; challenge is dark clothing at night	Walking: 1.2–1.8 m/s	Crossing mid-block at night in dark clothing
Children	Shorter stature reduces detection range for camera systems; faster/less predictable movement	Walking: 1.0–1.5 m/s; running: 3–5 m/s	Running into street from between parked cars
Cyclists	Narrow profile, dynamic speed and direction changes, interaction with traffic lanes	Cycling: 4–8 m/s urban	Sudden lane swerve to avoid obstacle; at-speed intersection crossing
Motorcyclists	Narrow radar cross-section (hardest VRU for radar); lane-splitting in CA; high speed	15–35 m/s urban/highway	Lane-splitting between traffic; sudden braking
E-scooter/e-bike riders	Faster than pedestrians, often without lights, sometimes wrong-way in bike lanes; unusual classification for older AV models	4–8 m/s	Night riding without lights, wrong-way in bike lane
Wheelchair users	Low silhouette; may move in roadway when sidewalks are blocked; slower than pedestrians	0.5–1.5 m/s	Crossing at non-designated point; in roadway due to blocked sidewalk

The regulatory environment for VRU safety is tightening:

NHTSA Standing General Order (2021): requires AV operators and partial automation system operators to report crashes to the NHTSA AV crash database within 24 hours (serious crashes) or annually (minor crashes); the database is publicly accessible
California DMV AV incident reporting: requires all AV operators to report any incident involving their AV, not just crashes; California is the primary regulatory environment for both Waymo and Tesla AV operations
ISO 21448 (SOTIF): Safety Of The Intended Functionality specifically catalogs VRU interaction as a Tier 1 test scenario category; SOTIF compliance is becoming a de facto requirement for commercial AV deployment in regulated markets
EU AV Regulation (expected 2025–2026): places VRU protection as a Tier 1 safety requirement; draft regulation language emphasizes sensor redundancy for driverless operations specifically in VRU-dense urban environments

The public scrutiny context amplifies the stakes: a single AV/pedestrian incident receives disproportionate media coverage relative to equivalent human-driver incidents. One high-profile VRU incident can trigger regulatory review, fleet suspension, and substantial public confidence loss — as demonstrated by the October 2023 Cruise suspension following a pedestrian collision in San Francisco. The VRU safety record is therefore not only a safety metric but an existential commercial risk metric.

Section 2 — Waymo’s VRU Detection: Multi-Sensor Fusion Advantages

VRU detection dimension	Waymo approach	Details	Safety implication
LIDAR-based VRU detection (day and night)	Active sensor: emits laser pulses; measures time of flight; detection accuracy is independent of ambient light	LIDAR detects pedestrian-shaped objects (upright bipedal silhouette) with centimeter-level spatial resolution at est. 100–300 meters (est.); detection is identical in darkness and full sunlight — LIDAR does not rely on reflected ambient light	Night LIDAR VRU detection is Waymo’s structural safety advantage: at the hours when pedestrian fatalities are highest (late evening/night), LIDAR maintains full detection capability while camera-based systems face their largest performance gap
Radar-based VRU velocity measurement	Radar measures Doppler velocity of objects; distinguishes moving pedestrians (Doppler matches human walking speed ~1.4 m/s) from stationary objects even in zero-visibility fog	Radar penetrates rain and fog that obscures cameras; provides VRU velocity even when LIDAR and camera visibility are degraded	Radar VRU detection is particularly valuable in San Francisco’s frequent coastal fog; LIDAR (spatial) + radar (velocity) + camera (visual classification) = three independent VRU detection pathways
Camera-based VRU visual classification	Cameras provide semantic VRU information: body pose (facing toward or away?), pedestrian intent (looking at phone vs making eye contact?), cyclist hand signals, child vs adult recognition, wheelchair classification	Camera provides behavioral context that LIDAR point clouds and radar Doppler cannot: a pedestrian at a crosswalk looking at their phone vs one making eye contact with the driver is behaviorally different even if spatially identical	Camera is the VRU behavioral intent layer; LIDAR is the precise spatial position layer; radar is the velocity layer; three-sensor fusion enables reliable detection AND behavioral prediction simultaneously
Occlusion handling	Waymo’s HD map provides context for occlusion scenarios: the system knows that parked cars at a specific crosswalk create a partial occlusion zone where a pedestrian could emerge; slows proactively before any VRU is detected	Map-informed occlusion awareness allows Waymo to slow proactively before any sensor detects a VRU — it knows from the map that a pedestrian COULD emerge from behind a parked car at a specific crosswalk location	HD map + LIDAR spatial awareness + behavioral prediction = an occlusion-safety system with multiple independent safeguards; a child running from behind a parked car triggers proactive slowing before the child appears to any sensor
Cyclist and micro-mobility prediction	Waymo has trained cyclist behavior prediction models on years of commercial data from SF and Phoenix; prediction estimates cyclist position in 2–5 seconds based on heading, speed, and road context	Cyclist behavior is harder to predict than pedestrian behavior (cyclists move faster and interact with traffic more dynamically); Waymo’s prediction has been trained on real urban cyclist behavior across multiple cities and years	Long training history on real commercial urban cyclists is a meaningful advantage; early AV systems struggled with urban cycling because cyclists were underrepresented in training datasets
Safety record (VRU)	Waymo’s NHTSA SGO and CA DMV incident reports show some low-speed incidents; Waymo has published safety reports citing zero life-threatening VRU injuries or fatalities in commercial driverless operations through mid-2026 (as publicly reported)	Full incident database available via NHTSA SGO and California DMV public records; media-cited incidents include a vehicle striking a cyclist’s bike while the cyclist was uninjured, and a vehicle stopping abnormally causing a minor rear-end collision by a human driver	Waymo’s VRU safety record in commercial driverless operations is strong relative to human-driver baseline; however, the fleet is small (est. 2,500+ vehicles, est. 150,000+ rides/week) vs Tesla’s (est. 6M+ vehicles) — statistical comparison requires rate normalization, not absolute counts

Section 3 — Tesla FSD’s Camera-Only VRU Detection

VRU detection dimension	Tesla approach	Details	Safety implication
Camera-only VRU detection	Tesla FSD relies exclusively on cameras for VRU detection (no LIDAR, no radar in recent Model 3/Y with Tesla Vision); FSD’s neural network must detect all VRU categories from camera input alone, across all lighting and weather conditions	Camera-based VRU detection is harder than LIDAR-based in low-light conditions: cameras need ambient or active light to create contrast between the VRU and the background; a pedestrian in dark clothing at night, illuminated only by vehicle headlights, is a harder detection problem than the same scene in daylight	Night VRU detection is the primary structural limitation of camera-only AV: est. 75% of US pedestrian fatalities occur in dark conditions (NHTSA data); a camera-only AV system must match LIDAR-equivalent VRU detection performance at night through neural network and headlight engineering alone
End-to-end VRU learning	Tesla’s end-to-end FSD neural network has been trained on est. 6 billion+ supervised miles of human driving data, including millions of VRU interaction scenarios across diverse geographies, lighting conditions, and weather	Scale advantage: Tesla’s training data includes proportionally more VRU scenario diversity than Waymo’s, simply because the fleet is vastly larger and includes consumer driving across all road types and times of day	Training data scale is a VRU scenario diversity advantage; quality limitation: human driver behavior in VRU scenarios is not always the safety gold standard — training data includes human VRU detection errors as well as correct responses
Night VRU detection with active headlights	Tesla uses vehicle headlights to illuminate the road ahead for camera detection; FSD cameras are designed for low-light performance with high-sensitivity image sensors; night FSD performance has improved significantly with each neural network generation	Headlight-illuminated camera detection works well for VRUs in the direct forward headlight cone; challenges remain for VRUs approaching from the periphery (side streets, driveways) and for complex ambient lighting scenarios	Tesla’s night camera performance is significantly better than standard automotive cameras; but active LIDAR illumination illuminates the scene at high spatial resolution in all directions simultaneously, while headlights illuminate primarily the forward cone
Cyclist prediction	FSD has been trained on billions of human-driver responses to cyclists across the US consumer fleet; cyclist prediction is an area where FSD has demonstrated strong improvement in consumer deployment	Consumer FSD users have reported both good and poor cyclist handling; Tesla does not publish systematic cyclist interaction performance data	Consumer fleet deployment means FSD encounters vastly more cyclist scenarios per week than Waymo; this scale of cyclist scenario experience is an advantage for prediction model improvement through continuous retraining
Known FSD VRU limitations (reported)	NHTSA investigations have included probes into FSD behavior near emergency vehicles (VRU-proximate environments) and highway construction zones (where workers are VRUs); a 2023 FSD v11.x recall involved crosswalk pedestrian behavior	Each NHTSA recall/investigation represents a VRU scenario where FSD behavior was deemed insufficient; OTA updates resolved reported issues; but the pattern of camera-only VRU edge cases leading to recalls vs Waymo’s multi-sensor redundancy is a structural architecture difference	Camera-only VRU detection requires continuous neural network improvement to address edge cases that LIDAR would handle through independent active sensing
Safety record (VRU, consumer FSD)	Tesla files with NHTSA under the Standing General Order for AV crashes; Tesla reports indicate the majority of consumer FSD crashes involve rear-end collisions and lane-change errors, not VRU collisions; VRU-specific rates are not separately broken out in public reports	NHTSA SGO crash database is publicly available; VRU-specific analysis requires filtering by crash type; Tesla’s consumer fleet generates more crashes in absolute terms (vastly more vehicle-miles), but the relevant metric is crashes per million FSD-engaged miles — not separately disclosed	Tesla does not publish FSD-engaged VRU collision rate per million miles; without this rate, direct comparison to Waymo’s driverless VRU record is not methodologically valid

Section 4 — Night Safety: The Critical VRU Battleground

Night is where the architectural difference between LIDAR-based and camera-only VRU detection has its largest safety implication. NHTSA pedestrian fatality data shows that approximately three-quarters of US pedestrian fatalities occur in dark conditions — making night performance the single most important dimension of VRU safety, not merely one of many dimensions.

Night VRU dimension	Waymo LIDAR-based	Tesla camera-based	Why this matters
Fundamental detection mechanism	LIDAR emits its own 905nm laser pulses; detection is independent of ambient light; a pedestrian in all-black clothing at night is detected at the same range and resolution as in daylight	Camera requires reflected light (ambient streetlights or vehicle headlights); low-ambient-light environments require high-sensitivity image sensors and neural network adaptation for sensor noise	LIDAR’s night detection is physically the same as daytime detection — no degradation; camera-based detection has a physical performance ceiling at night that LIDAR does not face
Detection range at night (est.)	LIDAR detects pedestrian-sized objects at est. 100–200 meters in nighttime conditions (est.); radar detects moving VRUs at even longer range with less spatial resolution	Camera-based detection range at night is limited by headlight throw distance: est. 50–100 meters for low beam, est. 150–200 meters for high beam (est.); VRUs outside the headlight cone may not be detected until much closer	Braking distance at 35 mph requires approximately 35 meters; at 45 mph it requires approximately 55 meters; headlight range (especially low beam) may be insufficient at higher urban speeds for sudden VRU appearance
Partial occlusion at night	LIDAR’s 360-degree coverage detects VRU reflection from any direction simultaneously; a pedestrian stepping off a curb from the side is detected in all lighting conditions at any approach angle	Camera headlights illuminate primarily the forward cone; a pedestrian approaching from the side or rear is not in the headlight beam and may not be visible to forward-facing cameras until they enter the forward arc	Waymo’s LIDAR provides 360-degree night-time VRU detection; Tesla’s headlight-illuminated cameras cover primarily the forward arc — a structural detection geometry difference
Pedestrian fatality rate context	NHTSA data: human-driver pedestrian fatalities peak in late evening/night; dark conditions (no streetlights, or insufficient headlights) are the most dangerous pedestrian collision environment; any AV system must exceed human-driver performance in this highest-risk lighting condition	Any AV VRU safety claim must specifically address night performance — the hours when pedestrian fatalities are highest are precisely where LIDAR has its largest structural performance advantage over camera-only systems	This is the single most important VRU safety dimension: the gap between LIDAR-based and camera-only night detection aligns exactly with the gap between highest-risk and lower-risk pedestrian collision hours
Weather degradation	Rain and fog degrade LIDAR at extreme densities; however radar penetrates both conditions reliably; the LIDAR + radar + camera combination provides redundant VRU detection even in poor weather	Rain and fog degrade camera visibility directly; heavy rain can obscure pedestrians to forward cameras at the distances needed for safe braking; neural networks trained on adverse weather improve performance but cannot overcome physical light-blocking by precipitation	Sensor redundancy means Waymo has backup VRU detection (radar velocity) even when primary sensors (LIDAR, camera) are degraded by weather; camera-only has no independent fallback sensor

The night safety analysis converges on a single structural conclusion: LIDAR’s light-independent VRU detection operates with no performance degradation in the exact hours (late evening to midnight) when pedestrian fatalities are highest and camera-based detection faces its largest performance penalty. This is not a marginal difference — it is a safety architecture difference in the highest-consequence operating condition for VRU safety.

Section 5 — VRU Safety Benchmark Scorecard

VRU safety dimension	Waymo	Tesla FSD	Edge	2028 outlook
Night VRU detection	High: LIDAR provides light-independent VRU detection; no degradation at night vs daytime	Moderate: camera-based detection is limited by headlight range and ambient light; neural network engineering significantly mitigates but cannot match LIDAR physics	Waymo — structural LIDAR advantage in the highest-risk lighting conditions	LIDAR cost reduction and camera neural network improvements continue; gap narrows but LIDAR retains physical night detection advantage through 2028
Multi-sensor VRU redundancy	High: LIDAR (spatial) + radar (velocity) + camera (visual classification) = three independent detection pathways; any single sensor failure is compensated by two others	Low: camera-only means a single sensor failure mode (lens contamination, glare artifact, neural network edge case) has no independent fallback for VRU detection	Waymo — three independent sensor pathways vs one	Tesla without radar/LIDAR has no architectural path to sensor redundancy; this is a fundamental not an incremental difference
Cyclist behavior prediction	High: years of commercial driverless urban cyclist data from SF and Phoenix; prediction trained on real commercial cycling scenarios	High: billions of human-driver miles including enormous cyclist scenario diversity; scale advantage in scenario breadth	Roughly equal — different advantages: Waymo = driverless-context quality; Tesla = scale/diversity	Both improve with more data; comparison requires published per-scenario performance metrics not currently available
Child and micro-mobility detection	Strong: LIDAR detects all physical objects regardless of height or size; children detectable at the same range as adults; e-scooters’ small radar cross-section is compensated by LIDAR	Training-data dependent: relies on neural network trained on child/micro-mobility scenarios; children’s shorter height is a known challenge for camera-based detection at longer range	Waymo — LIDAR size-independent detection is a structural advantage for shorter VRUs	Neural network improvements help Tesla; LIDAR size-independence remains a fundamental architecture advantage through 2028
Occlusion safety	Strong: HD map provides proactive crosswalk slow-down before any VRU is detected; LIDAR provides spatial context around occluding objects	Standard: FSD infers occlusion risk from visual scene context; no map-based proactive slow-down at known occlusion locations	Waymo — HD map + LIDAR enables proactive safety margins at known occlusion points	Tesla’s end-to-end model can learn occlusion-risk inference from training; map-based proactive slowing remains a Waymo-specific structural capability
Regulatory incident transparency	High: NHTSA SGO + California DMV incident reports publicly available; Waymo publishes annual safety reports with specific safety metrics	Moderate: NHTSA SGO reports filed; VRU-specific collision rates not separately published; FSD-engaged crash rate not disclosed	Waymo — more transparent VRU safety reporting enables external verification	Both companies face increasing regulatory requirements for VRU-specific data; mandatory VRU-rate disclosure likely by 2028

Overall verdict: Night VRU detection is where the multi-sensor vs camera-only architectural difference has the clearest and most consequential safety implication. LIDAR’s light-independent VRU detection is an active safety mechanism in the exact conditions where human-driver pedestrian fatalities are highest. Tesla’s camera-only approach requires the neural network to solve a fundamentally harder perception problem at night — and while FSD has improved dramatically with each version, camera-only night detection cannot match LIDAR’s physics-level night detection capability without a fundamental sensor architecture change.

For commercial driverless operations at scale, the VRU safety regulatory environment is moving toward multi-sensor redundancy requirements. The EU’s draft AV regulation and US NHTSA’s AV safety framework both emphasize sensor redundancy for fully driverless (not supervised) AV service. Tesla’s camera-only architecture may face increasing regulatory friction as VRU safety requirements tighten for driverless deployment specifically — not for supervised consumer FSD, which operates under a different regulatory tier. The structural sensor advantage belongs to Waymo in the current regulatory and safety-architecture landscape; whether Tesla’s neural network improvement trajectory can close the gap by 2028 is the key question for the next generation of this benchmark.

Sources: NHTSA Standing General Order AV crash database (nhtsa.gov); California DMV AV incident reports (dmv.ca.gov); Waymo safety report (waymo.com/safety); NHTSA pedestrian safety data (nhtsa.gov/road-safety/pedestrian-safety). All figures marked (est.) are estimates based on public disclosures, regulatory filings, and third-party reporting; they have not been independently verified.