2026-06-18 — views

HD Maps vs Mapless — What It Costs to Map a City and Why This Gates the AV Ramp

HD maps cost millions per city and require continuous refresh — Waymo depends on them, Tesla does not. This divide determines AV expansion speed at scale.

Article 116 in the Physical AI Benchmark Series — Physical AI Mapping War: HD Maps vs Mapless, What It Costs to Map a City, Why Waymo’s Map Pipeline Gates Its City Expansion, and Why Tesla’s Mapless Bet Is the Scalability Thesis

The most consequential architectural divide in autonomous vehicles is not sensor choice or compute platform — it is whether the system requires a pre-built high-definition (HD) map of every road before it can operate, or whether it can drive anywhere using real-time perception alone. Waymo requires HD maps. Tesla does not. This single architectural decision propagates through every downstream dimension: city expansion speed, operational cost, geographic coverage ceiling, resilience to road changes, and ultimately the total addressable market each approach can realistically reach.

HD mapping is expensive, slow, and requires continuous maintenance. Building a detailed centimeter-accurate 3D model of a metropolitan area takes months of specialized vehicle deployments, significant cloud processing, and rigorous human verification. Every subsequent road change — construction zone, lane restriping, new traffic light — requires a map update before the AV fleet can safely operate in that area. For a company operating in five cities, this is a manageable engineering problem. For a company with ambitions to cover hundreds of cities across multiple continents, it becomes a structural constraint on growth rate that no engineering investment can fully eliminate: mapping a city takes time and money regardless of how efficiently it is executed.

Tesla’s mapless architecture inverts this constraint. If a neural network trained on billions of miles of human driving data can generalize well enough to handle arbitrary road configurations from camera input alone, then there is no mapping prerequisite to expansion. Deploy the software, and the car can drive. The risk is that generalization is hard — a model trained predominantly on California roads may handle unusual geometry in a city it has rarely encountered less reliably than a Waymo system with a freshly verified HD map of that same area. The resolution of this trade-off — generalization depth vs mapping breadth — is the central empirical question of the mapping war.

This article maps HD mapping infrastructure as a Physical AI benchmark dimension across six analytical sections.

Section 1 — What Is an HD Map and What Does It Contain?

An HD (high-definition) map is not a navigation map. Consumer navigation maps (Google Maps, Apple Maps, OpenStreetMap) know that a road exists, roughly where it is, and what its name is. An HD map used for autonomous driving knows the precise 3D geometry of that road at centimeter accuracy, the exact position of every lane boundary, the semantic meaning of every lane (travel lane, turn lane, merge lane), where every traffic light is mounted and what directions it governs, and dozens of additional structured attributes that a navigation system does not need but an AV does.

HD map layer	What it stores	Why AV needs it	Update frequency needed
3D geometry	Precise 3D road surface, curb locations, lane boundaries accurate to 10-20 cm	AV localization: vehicle matches lidar scan to stored map to know exactly where it is on the road	Every significant road construction event
Lane semantics	Lane type (travel/turn/merge), lane connectivity graph (which lanes connect to which), lane direction	Route planning: knowing which lane connects to which exit/turn without relying on real-time perception	Road changes, new construction
Traffic elements	Traffic light positions and phases, stop sign positions, crosswalk locations	Knowing where to look for traffic control; anticipating signal state	When traffic control changes
Speed limits	Posted speed limits per lane segment	Regulatory compliance; default speed profile when signs not visible	Regulatory updates
Road surface annotations	Speed bumps, potholes (if mapped), road material	Ride comfort optimization; surface-type awareness	Difficult to maintain; often omitted
Map freshness requirement	Construction zones, temporary lane changes, road closures must be reflected within hours to days	An AV following a stale map that says “left lane open” when it is closed creates a safety issue	Near-real-time for dynamic changes

The localization function is the most critical. An AV that knows it is “somewhere on Main Street” to GPS accuracy (3-5 meters) cannot reliably execute a lane change. An AV that knows it is in the second lane from the right, 47 meters from the next intersection, positioned precisely within that lane — the precision that HD map matching provides when a lidar point cloud is correlated against stored 3D geometry — can plan and execute maneuvers with the specificity required for safe operation. This is why HD maps exist: they provide a pre-verified geometric prior that allows the AV to localize to centimeter precision in a way that real-time perception alone, in Waymo’s architecture, cannot fully substitute.

The freshness requirement creates the operational maintenance burden. A map that was accurate when surveyed becomes progressively less accurate as roads change. In an active urban environment, road conditions change continuously: construction zones open and close, lane markings are repainted, temporary signals are installed for events, lanes are reconfigured seasonally. Each change that is not reflected in the HD map creates a divergence between what the map says is true and what is actually on the road — a divergence that the AV’s safety depends on resolving before it drives in that area.

Section 2 — What Does It Cost to Build an HD Map of a City?

HD mapping costs are not publicly disclosed by the major AV companies, but the engineering requirements constrain the cost range. The following table structures the cost components with estimates derived from available information.

Cost dimension	Estimate	Notes
Mapping vehicle fleet	Specialized vehicles with lidar arrays, cameras, GPS/IMU; $150K-500K per vehicle (est.)	Purpose-built mapping vehicles; not consumer cars
Mapping a single city (area approx. 300 sq miles, est.)	$2M-10M in direct mapping cost (est.); includes vehicle depreciation, driver labor, fuel, processing	Rough estimate; large commercial map contracts suggest this range
Annual map maintenance (same city)	$500K-3M/year (est.)	Road changes, construction zones, new developments; must be updated continuously
Processing and annotation	Significant cloud compute for processing raw lidar/camera data into HD map format; AI-assisted annotation but still human QA	Major cost component; ongoing
Waymo mapping fleet	Waymo operates dedicated mapping vehicles that continuously re-drive mapped areas in addition to the Waymo One commercial fleet (est.)	Double vehicle operation overhead
5-city network (Waymo current est.)	Approximately $10M-50M+ annually to maintain HD maps for all operational cities (est.)	Rough order of magnitude; Waymo does not disclose
Cost per new city	$3M-15M to initially map a new metro area (est.); then $1M-5M/year ongoing (est.)	Each new city requires full mapping before first commercial ride

These estimates reflect a fundamental structural cost: unlike software that can be replicated at near-zero marginal cost, HD maps require physical vehicles to drive every road, compute resources to process the resulting data, and human expertise to verify the output. This cost does not shrink significantly with scale — mapping the 50th city costs roughly as much as mapping the 5th city, because the physical work of driving and processing each city’s road network is largely independent.

The processing and annotation component deserves attention. Raw lidar data from a mapping pass is a massive point cloud — billions of 3D points per city block — that must be processed into a structured semantic representation (lanes, signs, traffic elements) before it becomes an HD map. AI-assisted processing has significantly reduced the human annotation burden compared to early HD mapping efforts, but human quality assurance remains essential because mapping errors that reach the vehicle have safety consequences. The QA pipeline is a throughput constraint on how quickly a company can process new mapping data into deployable maps.

Section 3 — How HD Maps Gate Waymo’s City Expansion

The mapping requirement creates a sequenced prerequisite chain for every new city Waymo considers entering. Unlike a software product that can be shipped to a new market by changing a configuration parameter, Waymo’s service cannot launch in a new city until that city has been mapped, the map has been verified, and regulatory approval has been obtained. Each of these steps takes time and money that cannot be fully parallelized.

Expansion step	Time required (est.)	Cost (est.)	Gating factor
City selection	1-2 months (regulatory assessment and market analysis)	Minimal	Regulatory permissiveness, weather, demand density
HD map creation	3-6 months for a metro area (est.)	$3M-10M (est.)	Mapping vehicle availability; processing pipeline throughput
Map verification and validation	2-4 months (drive validation of map accuracy)	$1M-3M (est.)	Safety validation team bandwidth
Regulatory approval	6-24 months (varies enormously by state and city)	Significant legal and regulatory staff cost	Regulator speed; political environment
Fleet deployment	1-3 months (vehicle procurement, depot setup, RAO hiring)	$5M-20M (est.)	Vehicle supply chain; depot real estate
Total time to new commercial city	12-36 months from decision to first paid ride (est.)	$15M-50M+ total investment per city (est.)	Map creation and regulation are the dominant time constraints
Tesla comparison	No mapping step required	Zero map cost	Drive anywhere the FSD model has been trained on driving behavior

The 12-36 month timeline from decision to commercial launch is the strategic constraint. If Waymo decides today to enter a new metropolitan market, a customer in that market will not be able to hail a Waymo ride for at least one year under favorable conditions, and more likely two or more years if regulatory approval is slow. This timeline is not primarily a function of Waymo’s execution quality — it reflects the irreducible time required to physically map a city, verify the map, and obtain regulatory permission to operate.

This timeline constraint compounds across a portfolio of expansion targets. If Waymo is simultaneously pursuing ten new cities, it needs ten parallel mapping programs, ten parallel regulatory approval processes, and ten parallel fleet deployment programs. Each of these competes for the same internal resources: mapping vehicles, processing capacity, safety validation engineers, regulatory affairs staff, and capital. The operational scaling challenge is not software-like (where ten copies cost no more than one); it is hardware and labor-intensive at every step.

Tesla faces none of this sequencing. If Tesla decides to activate FSD supervised autonomy in a new geographic market, the activation is primarily a software and regulatory decision — no mapping fleet needs to drive the roads first. The FSD model’s training data already includes driving behavior from the global Tesla fleet, which provides broad coverage of road configurations across many geographies. The question is whether that training-derived knowledge is adequate for safe operation without a map prior — an empirical question the market is currently answering in real time.

Section 4 — Tesla’s Mapless Approach: How It Works and What It Risks

Tesla’s vision-only architecture makes different bets at every layer of the AV stack. Instead of a lidar-derived HD map for localization, it uses camera-based landmark matching against a coarse map. Instead of pre-planned routes based on known lane connectivity, it infers lane structure and connectivity in real time from camera input. Instead of knowing in advance where every traffic light is, it detects traffic lights as visual objects in the scene.

Dimension	Tesla’s approach	Advantage	Risk or limitation
How localization works	Vision-based localization: cameras identify landmarks (road markings, signs, buildings) to determine position relative to a coarse map (OpenStreetMap-level); no HD map required	Can operate anywhere without pre-mapping	Less precise localization than HD map matching; depends on landmark quality
How road geometry is understood	Real-time neural network inference from cameras; road boundaries, lane markings, and geometry are detected live per frame	No map staleness; always sees current road state	Model must generalize to road configurations it has not seen; rare configurations may be mishandled
Training data requirement	Billions of miles of human driving data to train the neural net to handle edge cases; more diverse geography means more robust generalization	Tesla’s 6M+ car fleet generates enormous geographic coverage	A new city with unusual road geometry still benefits from more specific data from that geography
Coverage at scale	FSD is already active in all US states and Canada; EU rollout underway; no mapping prerequisite	True global scalability if model generalizes	Model may have regional performance variation (rural vs urban, US vs EU road style)
Map staleness problem	Eliminated by design: perception is always current	Never follows a stale map into a construction zone	Must detect all edge cases in real-time; no fallback to pre-known geometry
Waymo’s counterargument	HD maps provide a geometric prior that reduces the perceptual burden; high-confidence localization enables higher safety margins	Map-aided localization is provably more precise in mapped areas	Only works where maps exist and are fresh; fundamentally non-scalable to unmapped areas

The training data advantage deserves quantification. Tesla’s fleet of more than 6 million vehicles generates an enormous corpus of real-world driving data across essentially every road configuration that exists in the markets where Tesla sells vehicles. A driver in Phoenix encountering an unusual intersection generates data that trains the model for every subsequent FSD user who encounters similar geometry anywhere. Waymo’s dataset, generated by its own much smaller commercial and mapping fleet, is much smaller in absolute terms but is collected under conditions specifically designed for AV training (consistent sensor configurations, structured annotation pipelines).

The question of whether scale of data compensates for quality and structure of data collection is the core empirical dispute between the two approaches. Waymo’s position is that unstructured consumer driving data, even in massive quantity, cannot substitute for carefully structured AV-specific data collection and annotation because the tail events that matter most for safety are rare enough that they require deliberate collection rather than hoping they occur in consumer driving. Tesla’s position is that the scale is so enormous that rare events are well covered, and that real-world distribution is exactly what should be trained on because it is what the model will encounter in deployment.

Section 5 — The Map Staleness Problem: A Critical Operational Failure Mode

Map staleness is not a theoretical concern for HD map-dependent systems — it is a daily operational reality in any city with active construction and road management. The following scenarios illustrate how staleness creates divergence between map state and road state, and how each architecture responds.

Staleness scenario	Impact on HD-map-dependent AV	Tesla vision-only impact
Construction zone opens overnight	AV follows old lane boundaries into construction barrier if map not updated; requires map refresh before safe operation resumes	AV perceives construction in real-time; adapts immediately (model quality determines response quality)
Temporary road closure (parade, event)	AV cannot enter closed road without map update; must be re-routed manually or wait for map refresh	AV reads closure signage and barriers in real-time
New traffic light installed	AV may not know traffic light exists at new position until map updated	AV detects traffic light via camera; no update needed
Lane restriping	Old lane markings may be painted over but still in map; AV may follow phantom lanes	AV follows current markings; no stale data
Waymo’s operational response	Dedicated map monitoring team; rapid update pipeline for known construction; some geofence restrictions while map updates are processed	No operational response needed for map staleness
Frequency of disruptive changes in a city	Urban areas experience hundreds of construction and road-change events simultaneously at any time	—

Waymo’s operational response to staleness is a dedicated team and pipeline: construction events are monitored, mapped areas are re-driven when significant changes occur, and geofence restrictions are applied to areas awaiting map updates. This is a functioning operational model, but it adds staffing cost and latency between a road change occurring and the AV fleet being able to safely operate in that zone again. In a high-frequency construction environment — a downtown core undergoing infrastructure renovation, for example — the map update backlog can create meaningful geographic restrictions on where the fleet can operate at any given time.

Tesla’s vision-only approach structurally eliminates staleness as an operational problem. A road change that occurs overnight is visible to the FSD cameras the next morning exactly as it is on the ground. The system does not need to be told that the construction zone opened — it sees the barriers, reads the signage, and adapts its path planning accordingly. The risk is that novel configurations (unusual temporary signage, non-standard barrier placement, edge cases outside the training distribution) may not be handled correctly. The model quality determines whether the real-time perception response is better or worse than following a stale map. Both failure modes exist; they are qualitatively different.

Section 6 — Physical AI Mapping Benchmark Scorecard

Benchmark dimension	Waymo (HD map)	Tesla (mapless)	Benchmark implication
Geographic expansion speed	12-36 months per new city (est.)	Weeks to months (regulatory only)	Tesla has structurally faster expansion cadence
Expansion cost per city	$15M-50M+ per city (est.)	Near zero map cost per city	Tesla has dramatically lower per-city cost
Localization precision	Centimeter-accurate in mapped areas	Landmark-dependent; less precise in ambiguous environments (est.)	Waymo more precise in mapped areas
Staleness resilience	Requires map refresh for significant road changes	Real-time perception always current	Tesla has structural staleness advantage
Map staleness risk	Active operational concern; managed by dedicated team	Structurally eliminated	Tesla eliminates a full failure mode
Novel city performance	Excellent in mapped area after map is complete and verified	Depends on training data coverage of that geography	Waymo more reliable in novel-geography edge cases where mapping is complete; Tesla more reliable before mapping could be completed
Scalability ceiling	Bounded by mapping capacity and cost; linear cost with geographic expansion	Bounded by model generalization quality; potentially near-unlimited geographic scale	Tesla’s approach has a higher theoretical scalability ceiling
Operational overhead	High: mapping fleet, map maintenance team, geofence management	Lower: no mapping operation required	Waymo carries a permanent operational cost structure Tesla does not
Safety model	Defense-in-depth: map prior plus real-time perception	Perception-only: no geometric prior to fall back on	Different failure mode profiles; neither is strictly safer in all conditions

The mapping benchmark reveals a fundamental asymmetry in the growth economics of the two approaches. Waymo’s cost structure is approximately linear with geographic expansion: each new city requires a proportional investment in mapping, verification, and ongoing maintenance. Tesla’s cost structure has a different shape: the training investment is largely amortized across all geographies the fleet covers, and incremental geographic expansion has near-zero map cost. If the generalization quality of Tesla’s model reaches the safety threshold required for commercial operation without a map prior, the economic structure of AV expansion changes completely — the cost of being in 100 cities becomes not 20 times the cost of being in 5 cities, but a fraction more.

The critical unknown is the generalization depth question. Waymo’s argument is that the safety requirement for AV operation is so high — orders of magnitude below human fatality rates per mile, as regulatory frameworks will likely eventually demand — that no amount of training data can substitute for the local precision that HD mapping provides, at least in the near term. Tesla’s argument is that human drivers navigate novel roads without HD maps every day, and a sufficiently trained model can match and exceed human performance everywhere without mapping. The market and the regulators will resolve this debate empirically over the next decade.

What the benchmark establishes with confidence is this: HD mapping is a gating constraint on expansion speed and geographic coverage ceiling, and mapless architectures that can achieve safety parity eliminate that constraint entirely. The mapping war is therefore not just a technical debate — it is a debate about what the scalability ceiling of autonomous vehicles ultimately is, and which companies will reach it first.

Note: All figures labeled “(est.)” are derived from publicly available information, engineering estimates, and industry reporting as of mid-2026. Waymo and Tesla do not publicly disclose detailed mapping costs or timelines; estimates are directional. This article does not constitute investment advice.