2026-06-18 — views
AV Emergency Vehicle Interaction — How Robotaxis Handle Police, Ambulances, and Fire Trucks
Sirens, fire trucks, police hand signals — emergency vehicle interaction is among the hardest AV edge cases and has driven real regulatory action worldwide.
Article 54 in the Physical AI Benchmark Series — Sirens, Stop Arms, and Hand Signals
Emergency vehicles create a compound detection problem that exposes some of the deepest gaps between how human drivers and autonomous systems perceive the world. A human driver hears a siren, locates its source, assesses the situation, and pulls to the right — all within seconds, drawing on spatial hearing, peripheral vision, and decades of trained instinct. An autonomous vehicle must replicate that chain through microphone arrays, camera banks, neural networks, and path-planning algorithms — and do it reliably across every shift of lighting, ambient noise, and traffic geometry.
These scenarios have moved from theoretical edge cases to documented operational failures. San Francisco Fire Department formally complained to the California Public Utilities Commission about Waymo vehicles blocking fire truck access on multiple occasions in 2022 and 2023. The October 2023 Cruise incident — which involved a pedestrian collision followed by a vehicle failure to immediately stop for police — became the most consequential regulatory event in the short history of commercial driverless operations, triggering the suspension of Cruise’s driverless permit and, ultimately, the shutdown of GM’s robotaxi program.
This article maps the technical challenge, documents the real-world record, and explains what the leading AV stacks have built to address it.
All figures marked (est.) are estimates based on published research, public company disclosures, and industry reporting. Performance figures have not been independently verified under controlled test conditions.
Section 1 — Why Emergency Vehicle Interaction Is Hard
Emergency vehicle interaction is not a single problem. It is at least six distinct detection and decision-making challenges that can occur simultaneously.
| Scenario | Human driver approach | AV challenge |
|---|---|---|
| Ambulance with siren | Hear siren → locate source → pull right | Audio detection + localization + pull-over decision |
| Fire truck approaching from behind | Mirror check + flashing red → move over | Multi-camera rear detection + light pattern recognition |
| Police officer hand signal | Eye contact + gesture reading → comply | Keypoint-based gesture classification in real time |
| School bus stop sign | Flashing red + extended arm → stop | Articulated arm detection + flashing lights in variable lighting |
| Emergency vehicle blocking lane | Navigate around with caution | Path planning around unpredictable, non-standard obstacle |
| Funeral procession | Recognize escort context → yield | Multi-vehicle convoy context recognition; jurisdiction-dependent rules |
Challenge 1: Multi-modal signal fusion. Emergency vehicles announce themselves through two independent channels — audio (siren) and visual (flashing lights). Human drivers fuse these automatically and unconsciously. An AV must run parallel detection pipelines, correlate their outputs, and avoid false positives from other loud noises or bright strobing lights (construction sites, nightclubs, other emergency vehicles blocks away).
Challenge 2: Non-standard trajectory prediction. Emergency vehicles are legally permitted to run red lights, travel the wrong way, accelerate through intersections, and stop abruptly. Standard AV motion prediction models are trained on normal vehicle behavior. An AV that applies normal prediction to an emergency vehicle will generate dangerous expected-path estimates.
Challenge 3: Pull-over decision and execution. Knowing that an emergency vehicle is approaching is only half the problem. The AV must determine where to pull over — not at an intersection, not in a crosswalk, not in a bike lane, not blocking a driveway — and execute a smooth lane change under time pressure, potentially in dense traffic.
Challenge 4: Police hand signals. An officer directing traffic overrides all traffic signals. This is a purely visual, real-time, human gesture recognition problem. The officer may be partially obscured, wearing different uniforms, using non-standardized gestures, and working at an intersection from any of four approach angles. No standardized dataset exists for training this capability.
Challenge 5: School bus stop arms. When a school bus activates its flashing red lights and extends its stop arm, all vehicles in both directions on an undivided road must stop in all 50 US states. The stop arm is a physical, articulated mechanical component that must be detected even in glare, rain, or low light. Detection distance matters: vehicles need time to brake smoothly.
Challenge 6: Jurisdiction-dependent rules. Funeral procession right-of-way varies by state. School bus stopping rules vary by road type and lane count. The distance required when passing a stopped emergency vehicle varies by state. An AV operating across multiple jurisdictions must carry a ruleset that differs by location.
Section 2 — Real-World Incidents
The emergency vehicle interaction record for commercial AV deployments contains several documented incidents that directly shaped regulatory outcomes.
| Incident | Date | What happened | Outcome |
|---|---|---|---|
| Waymo vehicles blocking fire trucks | Multiple incidents 2022–2023, San Francisco | Waymo vehicles pulled over in locations that obstructed fire truck access to incident scenes | SF Fire Dept formally complained to CPUC; became a central factor in the CPUC driverless permit debate |
| Cruise failure to stop for police | October 2023, San Francisco | Cruise robotaxi failed to immediately stop for police during a traffic stop following a pedestrian collision; vehicle then moved to what it assessed as a safer location | CPUC suspended Cruise’s driverless permit; contributed directly to GM shutting down Cruise robotaxi operations in November 2023 |
| Waymo and officer-directed traffic | 2023–2024, multiple incidents | Waymo vehicles stopped unexpectedly or behaved inconsistently when officers directed traffic manually at intersections | Led Waymo to develop improved “pull over and summon remote operator” protocol for ambiguous situations |
| General school bus detection | Ongoing | Multiple AV programs have required specific training datasets for school bus stop arm detection; state-by-state passing rules vary | No AV operator has received formal certification specifically for school bus stop arm compliance (est.) |
The Cruise October 2023 incident requires specific clarification. The sequence was: a Cruise vehicle struck a pedestrian who had already been hit by another vehicle; the Cruise vehicle then failed to immediately stop when police activated lights; the vehicle subsequently moved approximately 20 feet while the pedestrian was underneath it. CPUC’s suspension of Cruise’s driverless permit cited the failure to stop for police as a specific contributing factor, alongside broader concerns about incident reporting. The regulatory response — permit suspension, followed by GM’s decision to shutter the entire Cruise robotaxi program — was the most severe consequence any AV operator has faced to date.
The Waymo fire truck incidents, while less dramatic, matter because they reveal a subtler failure mode: the AV correctly identified the need to pull over, but selected pull-over locations that blocked emergency vehicle access. Getting the pull-over decision correct requires not just detecting the emergency vehicle but reasoning about where to stop — avoiding locations that would obstruct the very vehicles being yielded to.
Section 3 — Waymo’s Emergency Vehicle Interaction System
Following the SF Fire Department incidents, Waymo disclosed and implemented a systematic multi-layer response to emergency vehicle detection.
| System component | What it does |
|---|---|
| Audio detection module | Dedicated microphone array plus neural network for siren detection; localizes siren direction (front/rear/left/right) to inform pull-over planning |
| Emergency vehicle light pattern recognition | Trained on footage of specific light bar patterns from major US fire, police, and ambulance vehicle fleets; distinguishes emergency patterns from construction, tow trucks, and other strobing sources |
| Pull-over location selection | When siren plus lights are detected → identify nearest safe pull-over point that does not block intersections, crosswalks, bike lanes, or driveways → execute lane change → stop |
| Remote assistance integration | Complex scenarios (officer hand signals, ambiguous situations) → flag for human remote operator who provides guidance in approximately 30 seconds (est.) |
| Police stop protocol | If police lights directed at Waymo vehicle → pull over → activate hazard lights → wait for remote operator or officer approach |
| Geofenced improvement loop | After incidents, Waymo conducts targeted edge-case data collection in affected zones to build new training examples |
The remote assistance integration is architecturally significant. Waymo has publicly described a tiered response model in which the vehicle handles routine emergency vehicle scenarios autonomously and escalates ambiguous cases — particularly officer hand signals — to a human remote operator. This means Waymo’s driverless vehicles are not fully autonomous in the hardest emergency scenarios; they are human-assisted in real time. The 30-second escalation response is an internal estimate and has not been independently audited.
The pull-over location selection improvement directly addressed the fire truck blocking incidents. The updated protocol includes explicit constraints: candidate pull-over locations are filtered against known intersection boundaries, crosswalk markings, bike lane designations, and fire hydrant proximity. The system must balance pulling over quickly against pulling over in a location that does not create a secondary obstruction.
Section 4 — Tesla FSD and Emergency Vehicle Handling
Tesla FSD operates under supervised conditions (safety driver present in current consumer vehicles). The Cybercab driverless program, targeting commercial deployment, must meet a higher regulatory bar for emergency scenarios.
| Dimension | Detail |
|---|---|
| Audio detection | Tesla vehicles have microphones primarily designed for cabin noise cancellation and voice commands; emergency vehicle siren detection as a separate capability is publicly stated as implemented in FSD (est.) |
| Visual detection | Camera-based flashing light detection; end-to-end v12 and v13 models trained on emergency vehicle clips drawn from the global fleet |
| Pull-over behavior | FSD trained to recognize emergency vehicle approach and suggest or execute pull-over; safety driver can override at any time |
| Police hand signals | Harder problem for camera-only system; FSD v13 included improvements to pedestrian and officer gesture recognition (est.) |
| No remote operator | Unlike Waymo, Tesla’s driverless Cybercab plan relies on the neural network alone, without a human remote operator in the loop |
| Regulatory requirement | For Cybercab driverless operation, emergency vehicle handling must meet regulatory standards currently under development and evaluation |
The absence of a remote operator backup creates a structural difference in how the two companies approach the hardest edge cases. Waymo can escalate an ambiguous officer hand-signal scenario to a human within seconds. Tesla’s driverless system must handle that same scenario through neural network inference alone. Whether end-to-end model training on fleet data can reach the reliability needed for police-gesture compliance without human backup is one of the unresolved questions in the Cybercab regulatory review.
Tesla’s camera-only architecture also affects siren localization. Waymo’s microphone array can determine whether a siren is approaching from the front, rear, or side — information that directly informs whether and where to pull over. Camera systems must infer emergency vehicle approach direction from visual cues alone, which works reliably once an emergency vehicle is visible in the camera field but provides no advance warning from sound alone.
Section 5 — Regulatory Standards and What Passing Looks Like
No universal federal standard for AV emergency vehicle interaction exists as of mid-2026. The regulatory landscape is state-by-state with California functioning as the de facto most rigorous jurisdiction.
| Requirement | Current status |
|---|---|
| Pull over for emergency vehicles | Required under law in all 50 states; AVs must comply with the same statutes as human drivers |
| School bus stop arm | Federal law plus all 50 state laws require stopping; no AV has received formal pass/fail certification specifically for stop arm compliance (est.) |
| Police traffic direction | Legal requirement to follow officer directions; no AV-specific technical standard; tested case-by-case in incident review |
| NHTSA FMVSS | Federal Motor Vehicle Safety Standards do not yet include specific emergency-vehicle-response performance requirements as of mid-2026 (est.) |
| CPUC California | Has become the most operationally rigorous AV regulatory body through active permit proceedings; the SF incidents drove the first major permit suspension in US history |
| What passing looks like | No agreed metric exists; industry direction is that AV must match or exceed human driver emergency vehicle response rates across a defined test set |
The CPUC’s authority over driverless permits has made California both the most attractive and the most scrutinized AV deployment market. Operators that expand geofenced operations must demonstrate emergency vehicle interaction performance to CPUC reviewers. The Cruise suspension established that a single high-profile failure can result in permit revocation; the Waymo fire truck incidents established that persistent lower-severity failures can drive operational requirement changes.
The absence of a federal standard creates an uneven competitive landscape. Operators in California face the most rigorous requirements; operators in states with lighter-touch AV regulations face lower formal bars. As driverless deployment scales beyond California, NHTSA is expected to develop federal performance floors — likely drawing on the California record as the primary evidence base.
Sources: CPUC driverless vehicle permit proceedings (cpuc.ca.gov); SF Fire Dept complaints about AV blocking, SF Examiner coverage (sfexaminer.com); Cruise October 2023 incident, NHTSA special investigation (nhtsa.gov/vehicle-safety/automated-vehicles); Waymo emergency vehicle response updates (waymo.com/blog/). All figures marked (est.) are estimates based on published research, public operational disclosures, and industry reporting; they have not been independently verified under controlled test conditions and should be treated as directional rather than precise.
Sources
- CPUC driverless permit proceedings — California PUC ↗
- SF Fire Dept complaints about AV blocking — SF Examiner coverage ↗
- Cruise October 2023 incident — NHTSA special investigation ↗
- Waymo emergency vehicle response improvements — Waymo blog ↗