2026-06-17 — views

AV Sensor Stack Index — Tesla Vision vs. Waymo LiDAR: The Perception Architecture Race

Comparing autonomous vehicle sensor stacks — Tesla camera-only vs. Waymo LiDAR fusion — across cost, weather resilience, and architecture trade-offs.

The sensor stack is the first fork in autonomous vehicle architecture

Before a self-driving system can plan a route or execute a maneuver, it must perceive the world around it. The sensors that feed that perception pipeline are not a commodity detail — they determine the cost, the geographic range, the weather resilience, and ultimately the safety ceiling of the entire autonomy stack.

The industry has split into two camps. Waymo, Mobileye, Cruise, and Zoox have all built systems that layer LiDAR, radar, and cameras together into redundant sensor fusion architectures. Tesla has pursued the opposite philosophy: cameras alone, no LiDAR, no ultrasonic sensors (removed in 2022), backed by the largest video training dataset in the industry.

This is the sixth article in the physical AI benchmark series. Prior articles covered operational ramp, humanoid robots, AV regulation, investment capital, and compute silicon. This article maps the perception layer — the sensor hardware, the cost implications, and the fundamental architectural arguments on both sides.

Section 1 — Master sensor stack comparison

The table below covers the five most-referenced sensor architectures in the autonomous vehicle industry as of mid-2026. Camera counts, LiDAR units, and radar modules reflect publicly available specification details or manufacturer communications.

Company	Cameras	LiDAR	Radar	Ultrasonic	Compute	Architecture
Waymo (Gen 6)	29	4 units (360° Honeycomb + short-range)	6	None stated	Custom ASIC (Waymo Driver chip)	Full sensor fusion, driverless L4
Tesla (HW4)	8 (360° + forward telephoto)	0	1 (optional, some models)	0 (removed 2022)	Dual FSD chip	Vision-only, end-to-end neural net
Mobileye (EyeQ Ultra)	Cameras + optional LiDAR	Optional	Yes	No	EyeQ Ultra ASIC	REM mapping layer, scalable fusion
Cruise (paused)	40	5	16	Yes	Custom ASIC	Maximum redundancy stack
Zoox (Amazon)	8	4	4	No	Custom compute	Bidirectional vehicle, full fusion

Reading the table: Waymo’s 29-camera count reflects full 360-degree peripheral coverage plus specialized near-field cameras, substantially more than Tesla’s 8. The Cruise stack, with 40 cameras, 5 LiDAR units, and 16 radar modules, represented the most redundant commercial AV architecture before GM paused the program in late 2023. Zoox’s bidirectional vehicle design (no front/back — it operates identically in both directions) requires symmetric sensor coverage on both ends.

Tesla’s 0 LiDAR is a deliberate design choice, not a cost omission. The dual FSD chip delivers over 1,000 TOPS (Tesla’s internal benchmark methodology) for pure camera-based inference. Removing ultrasonic sensors in 2022 was a parallel step — both decisions reflect the same architectural bet that neural networks trained on camera data can achieve safety equivalent to or better than sensor fusion approaches.

Section 2 — The core debate: vision-only vs. sensor fusion

The two architectures represent genuinely different hypotheses about what makes autonomous driving safe. Neither is obviously wrong — but the assumptions are incompatible enough that companies have built entirely different engineering organizations around each one.

Tesla’s case for vision-only

Human roads are designed for human eyes. Every traffic sign, lane marking, traffic light, and pedestrian signal was engineered to be readable by a pair of human eyes from a moving vehicle. If the task is to drive on roads designed for human vision, cameras are the correct sensor — a sensor that produces the same information source as the system the road infrastructure was designed around.

Cost arithmetic. LiDAR units cost between 500 and 5,000 USD per vehicle at current manufacturing scale. Camera sensors cost a few dollars each. Across millions of vehicles, this cost gap becomes the difference between a sub-30,000 USD consumer product and a hardware-amortized robotaxi service.

Fleet data scale. Tesla’s fleet of over 6 million vehicles generates a continuous stream of camera footage from real-world edge cases: unusual lane configurations, construction zones, pedestrians behaving unexpectedly, emergency vehicles, flooding, sun glare. No LiDAR-equipped fleet can match this data volume. The training advantage compounds over time — each new edge case that the fleet encounters improves the model for every other Tesla.

End-to-end learning without map dependency. Tesla’s Dojo-trained approach aims to learn a driving policy directly from video input, without relying on high-definition pre-mapped road networks. This makes the system extensible to any road — not just those that have been pre-surveyed and mapped.

Waymo’s case for sensor fusion

LiDAR provides depth the camera cannot. A camera perceives the world as a 2D projection. Estimating 3D distance from a single camera requires inference — the neural network must learn to judge depth from visual cues like apparent size, shadow, and parallax. LiDAR measures distance directly via time-of-flight laser pulses, producing a precise 3D point cloud regardless of lighting conditions. No amount of neural network training eliminates the fundamental ambiguity in monocular depth estimation.

Radar penetrates conditions cameras cannot. Millimeter-wave radar passes through rain, snow, and fog that degrades camera image quality. A camera behind a lens fogged by condensation or obscured by heavy rain loses meaningful signal. Radar does not. In adverse weather, radar provides the structural scene information — where are large objects, how fast are they moving — that camera systems lose.

Redundancy for driverless operation. A robotaxi with no safety driver cannot pull over and ask a human to take over. When one sensor class fails — a camera lens cracked, a LiDAR unit blocked by ice, a radar module hit by debris — the system must continue operating safely on the remaining sensors. The safety margin required for fully driverless L4 operation is higher than for L2 ADAS where a human is monitoring and can intervene.

Safety standards favor redundancy. ISO 21448 (SOTIF — Safety of the Intended Functionality) and the broader L4 regulatory environment implicitly favor architectures with multiple independent sensing modalities. Regulators can require a company to demonstrate how the system degrades gracefully when any single sensor fails. A vision-only system has no fallback for camera failure.

Section 3 — Cost-per-vehicle sensor stack estimate

The cost gap between the two architectures is the most concrete number in this debate. The following estimates reflect manufacturing cost at scale — not retail or replacement cost — based on available industry data and analyst estimates as of 2026.

Component	Waymo Gen 6 (est.)	Tesla HW4 (est.)
Camera array	~200 USD	~150 USD
LiDAR (4 units)	~3,000–5,000 USD	0 USD
Radar	~300 USD	~100 USD (optional)
Compute (ASIC / FSD chip)	~500 USD	~400 USD (dual chip)
Total sensor + compute	~4,000–6,000 USD	~650–700 USD

What this cost gap means for business models. Waymo operates a ride-hailing service — the Waymo One robotaxi — where each vehicle generates ongoing revenue that amortizes the hardware cost over the vehicle’s operating lifetime. This model can justify a 4,000–6,000 USD sensor suite if the vehicle runs tens of thousands of miles per year and charges per ride.

Tesla’s Cybercab, targeting a below-30,000 USD price point, is not compatible with a 5,000 USD LiDAR stack. The camera-only architecture is not just a philosophical position — it is a prerequisite for the consumer vehicle business model that Tesla is building toward. A Cybercab with Waymo’s sensor stack would need to cost 35,000–40,000 USD or higher, eliminating the mass-market robotaxi proposition entirely.

Section 4 — Weather and geography constraints

The sensor architecture choice determines where a system can operate reliably. The following table maps conditions to expected performance characteristics — these are architectural tendencies, not guaranteed outcome measurements for any specific system.

Condition	Camera-only (Tesla)	LiDAR + Camera (Waymo)
Bright sunlight	Good	Good
Night (urban, lit)	Good (HW4 low-light optimized)	Excellent
Heavy rain	Degraded	Good (radar fallback)
Dense fog	Significantly degraded	Moderate (LiDAR scatters in fog)
Snow (road markings obscured)	Significantly degraded	Moderate
Sensor occlusion (dirt / ice on lens)	Single point of failure risk	Redundant fallback available

This table explains geography choices in the industry. Waymo launched its commercial service in Phoenix, Arizona, and expanded to San Francisco and Los Angeles — all markets with mild weather, high average annual sunshine, and minimal snow. Heavy rain events in Phoenix are infrequent. This is not coincidence — even with LiDAR and radar, dense fog and ice remain challenging conditions for current AV systems.

Tesla chose Austin, Texas as its first robotaxi market, announced for 2025. Austin’s climate is characterized by dry heat and low precipitation compared to US cities at higher latitudes. The vision-only architecture performs better in dry, well-lit conditions. Selecting Austin over, say, Seattle or Minneapolis is an acknowledgment of the operating envelope constraints of camera-only perception.

Neither company has demonstrated robust driverless operation at scale in heavy snow or sustained dense fog. The difference between the architectures in adverse weather is that Waymo’s system has more sensor modalities providing fallback information — but current LiDAR performance also degrades in dense fog as laser pulses scatter off water droplets before reaching objects.

Section 5 — Convergence thesis

Some analysts argue the two camps will converge over time. The argument runs in both directions.

The case for Tesla adding LiDAR. LiDAR manufacturing costs have been falling consistently since 2017. Luminar Technologies has publicly targeted a below-100 USD unit cost at volume production scale; Innoviz Technologies and Hesai have similar roadmaps. If LiDAR drops to 50–100 USD per unit — roughly the cost of a mid-range camera module — the cost argument against it weakens substantially. Tesla might adopt LiDAR for its robotaxi fleet even if it does not add it to consumer vehicles, accepting a higher hardware cost for a commercial vehicle that generates revenue rather than competing on consumer price.

The case against Tesla adding LiDAR. Tesla’s entire training pipeline is vision-optimized. The Dojo training infrastructure processes video. The end-to-end neural network policy is trained on camera data. The fleet data collection system gathers camera footage. Adding LiDAR would not simply mean bolting a sensor onto the car — it would require rebuilding the data pipeline to ingest LiDAR point clouds, retraining models on fused data, and rearchitecting the inference stack. This is a multi-year engineering program, not a product refresh.

Waymo moving toward simpler stacks? Waymo’s sixth-generation vehicle uses fewer, more capable LiDAR units compared to earlier generations — the sensor count has been declining as individual unit performance improves. This is not the same as removing LiDAR. Waymo has shown no indication of moving toward vision-only architectures for its L4 driverless service.

The most likely outcome. Both approaches will improve on their own terms. Camera-based systems will continue to benefit from larger training datasets, better neural architectures, and higher-resolution sensors. LiDAR-based systems will benefit from lower-cost, longer-range, higher-resolution units. The cost gap will narrow but may not close entirely at the manufacturing volumes relevant for consumer vehicles.

Benchmark context: this is the sixth article in the physical AI series

This tracker is the sixth in a series covering physical AI from multiple angles:

Operational ramp metrics — production counts, deployment scale, miles driven
Humanoid robot technology — hardware generations, dexterity benchmarks, foundation model capabilities
AV safety and regulation — California DMV data, NHTSA crash reporting, state permit maps
Investment and valuation — capital flows, funding rounds, implied valuations
Compute and silicon — inference chips, training clusters, NVIDIA supply constraints
Sensor stack and perception architecture — this article

The sensor architecture question sits at the intersection of cost, safety, and scalability. It is not a question that will be resolved by a single demonstration or a single accident. It will be resolved by fleet data, regulatory decisions, and ultimately by which architecture proves capable of handling the full distribution of real-world driving conditions at acceptable cost. That resolution is still in progress.