2026-06-18 — views
HD Maps vs Mapless — What It Costs to Map a City and Why This Gates the AV Ramp
HD maps cost millions per city and require continuous refresh — Waymo depends on them, Tesla does not. This divide determines AV expansion speed at scale.
Article 116 in the Physical AI Benchmark Series — Physical AI Mapping War: HD Maps vs Mapless, What It Costs to Map a City, Why Waymo’s Map Pipeline Gates Its City Expansion, and Why Tesla’s Mapless Bet Is the Scalability Thesis
The most consequential architectural divide in autonomous vehicles is not sensor choice or compute platform — it is whether the system requires a pre-built high-definition (HD) map of every road before it can operate, or whether it can drive anywhere using real-time perception alone. Waymo requires HD maps. Tesla does not. This single architectural decision propagates through every downstream dimension: city expansion speed, operational cost, geographic coverage ceiling, resilience to road changes, and ultimately the total addressable market each approach can realistically reach.
HD mapping is expensive, slow, and requires continuous maintenance. Building a detailed centimeter-accurate 3D model of a metropolitan area takes months of specialized vehicle deployments, significant cloud processing, and rigorous human verification. Every subsequent road change — construction zone, lane restriping, new traffic light — requires a map update before the AV fleet can safely operate in that area. For a company operating in five cities, this is a manageable engineering problem. For a company with ambitions to cover hundreds of cities across multiple continents, it becomes a structural constraint on growth rate that no engineering investment can fully eliminate: mapping a city takes time and money regardless of how efficiently it is executed.
Tesla’s mapless architecture inverts this constraint. If a neural network trained on billions of miles of human driving data can generalize well enough to handle arbitrary road configurations from camera input alone, then there is no mapping prerequisite to expansion. Deploy the software, and the car can drive. The risk is that generalization is hard — a model trained predominantly on California roads may handle unusual geometry in a city it has rarely encountered less reliably than a Waymo system with a freshly verified HD map of that same area. The resolution of this trade-off — generalization depth vs mapping breadth — is the central empirical question of the mapping war.
This article maps HD mapping infrastructure as a Physical AI benchmark dimension across six analytical sections.
Section 1 — What Is an HD Map and What Does It Contain?
An HD (high-definition) map is not a navigation map. Consumer navigation maps (Google Maps, Apple Maps, OpenStreetMap) know that a road exists, roughly where it is, and what its name is. An HD map used for autonomous driving knows the precise 3D geometry of that road at centimeter accuracy, the exact position of every lane boundary, the semantic meaning of every lane (travel lane, turn lane, merge lane), where every traffic light is mounted and what directions it governs, and dozens of additional structured attributes that a navigation system does not need but an AV does.
| HD map layer | What it stores | Why AV needs it | Update frequency needed |
|---|---|---|---|
| 3D geometry | Precise 3D road surface, curb locations, lane boundaries accurate to 10-20 cm | AV localization: vehicle matches lidar scan to stored map to know exactly where it is on the road | Every significant road construction event |
| Lane semantics | Lane type (travel/turn/merge), lane connectivity graph (which lanes connect to which), lane direction | Route planning: knowing which lane connects to which exit/turn without relying on real-time perception | Road changes, new construction |
| Traffic elements | Traffic light positions and phases, stop sign positions, crosswalk locations | Knowing where to look for traffic control; anticipating signal state | When traffic control changes |
| Speed limits | Posted speed limits per lane segment | Regulatory compliance; default speed profile when signs not visible | Regulatory updates |
| Road surface annotations | Speed bumps, potholes (if mapped), road material | Ride comfort optimization; surface-type awareness | Difficult to maintain; often omitted |
| Map freshness requirement | Construction zones, temporary lane changes, road closures must be reflected within hours to days | An AV following a stale map that says “left lane open” when it is closed creates a safety issue | Near-real-time for dynamic changes |
The localization function is the most critical. An AV that knows it is “somewhere on Main Street” to GPS accuracy (3-5 meters) cannot reliably execute a lane change. An AV that knows it is in the second lane from the right, 47 meters from the next intersection, positioned precisely within that lane — the precision that HD map matching provides when a lidar point cloud is correlated against stored 3D geometry — can plan and execute maneuvers with the specificity required for safe operation. This is why HD maps exist: they provide a pre-verified geometric prior that allows the AV to localize to centimeter precision in a way that real-time perception alone, in Waymo’s architecture, cannot fully substitute.
The freshness requirement creates the operational maintenance burden. A map that was accurate when surveyed becomes progressively less accurate as roads change. In an active urban environment, road conditions change continuously: construction zones open and close, lane markings are repainted, temporary signals are installed for events, lanes are reconfigured seasonally. Each change that is not reflected in the HD map creates a divergence between what the map says is true and what is actually on the road — a divergence that the AV’s safety depends on resolving before it drives in that area.
Section 2 — What Does It Cost to Build an HD Map of a City?
HD mapping costs are not publicly disclosed by the major AV companies, but the engineering requirements constrain the cost range. The following table structures the cost components with estimates derived from available information.
| Cost dimension | Estimate | Notes |
|---|---|---|
| Mapping vehicle fleet | Specialized vehicles with lidar arrays, cameras, GPS/IMU; $150K-500K per vehicle (est.) | Purpose-built mapping vehicles; not consumer cars |
| Mapping a single city (area approx. 300 sq miles, est.) | $2M-10M in direct mapping cost (est.); includes vehicle depreciation, driver labor, fuel, processing | Rough estimate; large commercial map contracts suggest this range |
| Annual map maintenance (same city) | $500K-3M/year (est.) | Road changes, construction zones, new developments; must be updated continuously |
| Processing and annotation | Significant cloud compute for processing raw lidar/camera data into HD map format; AI-assisted annotation but still human QA | Major cost component; ongoing |
| Waymo mapping fleet | Waymo operates dedicated mapping vehicles that continuously re-drive mapped areas in addition to the Waymo One commercial fleet (est.) | Double vehicle operation overhead |
| 5-city network (Waymo current est.) | Approximately $10M-50M+ annually to maintain HD maps for all operational cities (est.) | Rough order of magnitude; Waymo does not disclose |
| Cost per new city | $3M-15M to initially map a new metro area (est.); then $1M-5M/year ongoing (est.) | Each new city requires full mapping before first commercial ride |
These estimates reflect a fundamental structural cost: unlike software that can be replicated at near-zero marginal cost, HD maps require physical vehicles to drive every road, compute resources to process the resulting data, and human expertise to verify the output. This cost does not shrink significantly with scale — mapping the 50th city costs roughly as much as mapping the 5th city, because the physical work of driving and processing each city’s road network is largely independent.
The processing and annotation component deserves attention. Raw lidar data from a mapping pass is a massive point cloud — billions of 3D points per city block — that must be processed into a structured semantic representation (lanes, signs, traffic elements) before it becomes an HD map. AI-assisted processing has significantly reduced the human annotation burden compared to early HD mapping efforts, but human quality assurance remains essential because mapping errors that reach the vehicle have safety consequences. The QA pipeline is a throughput constraint on how quickly a company can process new mapping data into deployable maps.
Section 3 — How HD Maps Gate Waymo’s City Expansion
The mapping requirement creates a sequenced prerequisite chain for every new city Waymo considers entering. Unlike a software product that can be shipped to a new market by changing a configuration parameter, Waymo’s service cannot launch in a new city until that city has been mapped, the map has been verified, and regulatory approval has been obtained. Each of these steps takes time and money that cannot be fully parallelized.
| Expansion step | Time required (est.) | Cost (est.) | Gating factor |
|---|---|---|---|
| City selection | 1-2 months (regulatory assessment and market analysis) | Minimal | Regulatory permissiveness, weather, demand density |
| HD map creation | 3-6 months for a metro area (est.) | $3M-10M (est.) | Mapping vehicle availability; processing pipeline throughput |
| Map verification and validation | 2-4 months (drive validation of map accuracy) | $1M-3M (est.) | Safety validation team bandwidth |
| Regulatory approval | 6-24 months (varies enormously by state and city) | Significant legal and regulatory staff cost | Regulator speed; political environment |
| Fleet deployment | 1-3 months (vehicle procurement, depot setup, RAO hiring) | $5M-20M (est.) | Vehicle supply chain; depot real estate |
| Total time to new commercial city | 12-36 months from decision to first paid ride (est.) | $15M-50M+ total investment per city (est.) | Map creation and regulation are the dominant time constraints |
| Tesla comparison | No mapping step required | Zero map cost | Drive anywhere the FSD model has been trained on driving behavior |
The 12-36 month timeline from decision to commercial launch is the strategic constraint. If Waymo decides today to enter a new metropolitan market, a customer in that market will not be able to hail a Waymo ride for at least one year under favorable conditions, and more likely two or more years if regulatory approval is slow. This timeline is not primarily a function of Waymo’s execution quality — it reflects the irreducible time required to physically map a city, verify the map, and obtain regulatory permission to operate.
This timeline constraint compounds across a portfolio of expansion targets. If Waymo is simultaneously pursuing ten new cities, it needs ten parallel mapping programs, ten parallel regulatory approval processes, and ten parallel fleet deployment programs. Each of these competes for the same internal resources: mapping vehicles, processing capacity, safety validation engineers, regulatory affairs staff, and capital. The operational scaling challenge is not software-like (where ten copies cost no more than one); it is hardware and labor-intensive at every step.
Tesla faces none of this sequencing. If Tesla decides to activate FSD supervised autonomy in a new geographic market, the activation is primarily a software and regulatory decision — no mapping fleet needs to drive the roads first. The FSD model’s training data already includes driving behavior from the global Tesla fleet, which provides broad coverage of road configurations across many geographies. The question is whether that training-derived knowledge is adequate for safe operation without a map prior — an empirical question the market is currently answering in real time.
Section 4 — Tesla’s Mapless Approach: How It Works and What It Risks
Tesla’s vision-only architecture makes different bets at every layer of the AV stack. Instead of a lidar-derived HD map for localization, it uses camera-based landmark matching against a coarse map. Instead of pre-planned routes based on known lane connectivity, it infers lane structure and connectivity in real time from camera input. Instead of knowing in advance where every traffic light is, it detects traffic lights as visual objects in the scene.
| Dimension | Tesla’s approach | Advantage | Risk or limitation |
|---|---|---|---|
| How localization works | Vision-based localization: cameras identify landmarks (road markings, signs, buildings) to determine position relative to a coarse map (OpenStreetMap-level); no HD map required | Can operate anywhere without pre-mapping | Less precise localization than HD map matching; depends on landmark quality |
| How road geometry is understood | Real-time neural network inference from cameras; road boundaries, lane markings, and geometry are detected live per frame | No map staleness; always sees current road state | Model must generalize to road configurations it has not seen; rare configurations may be mishandled |
| Training data requirement | Billions of miles of human driving data to train the neural net to handle edge cases; more diverse geography means more robust generalization | Tesla’s 6M+ car fleet generates enormous geographic coverage | A new city with unusual road geometry still benefits from more specific data from that geography |
| Coverage at scale | FSD is already active in all US states and Canada; EU rollout underway; no mapping prerequisite | True global scalability if model generalizes | Model may have regional performance variation (rural vs urban, US vs EU road style) |
| Map staleness problem | Eliminated by design: perception is always current | Never follows a stale map into a construction zone | Must detect all edge cases in real-time; no fallback to pre-known geometry |
| Waymo’s counterargument | HD maps provide a geometric prior that reduces the perceptual burden; high-confidence localization enables higher safety margins | Map-aided localization is provably more precise in mapped areas | Only works where maps exist and are fresh; fundamentally non-scalable to unmapped areas |
The training data advantage deserves quantification. Tesla’s fleet of more than 6 million vehicles generates an enormous corpus of real-world driving data across essentially every road configuration that exists in the markets where Tesla sells vehicles. A driver in Phoenix encountering an unusual intersection generates data that trains the model for every subsequent FSD user who encounters similar geometry anywhere. Waymo’s dataset, generated by its own much smaller commercial and mapping fleet, is much smaller in absolute terms but is collected under conditions specifically designed for AV training (consistent sensor configurations, structured annotation pipelines).
The question of whether scale of data compensates for quality and structure of data collection is the core empirical dispute between the two approaches. Waymo’s position is that unstructured consumer driving data, even in massive quantity, cannot substitute for carefully structured AV-specific data collection and annotation because the tail events that matter most for safety are rare enough that they require deliberate collection rather than hoping they occur in consumer driving. Tesla’s position is that the scale is so enormous that rare events are well covered, and that real-world distribution is exactly what should be trained on because it is what the model will encounter in deployment.
Section 5 — The Map Staleness Problem: A Critical Operational Failure Mode
Map staleness is not a theoretical concern for HD map-dependent systems — it is a daily operational reality in any city with active construction and road management. The following scenarios illustrate how staleness creates divergence between map state and road state, and how each architecture responds.
| Staleness scenario | Impact on HD-map-dependent AV | Tesla vision-only impact |
|---|---|---|
| Construction zone opens overnight | AV follows old lane boundaries into construction barrier if map not updated; requires map refresh before safe operation resumes | AV perceives construction in real-time; adapts immediately (model quality determines response quality) |
| Temporary road closure (parade, event) | AV cannot enter closed road without map update; must be re-routed manually or wait for map refresh | AV reads closure signage and barriers in real-time |
| New traffic light installed | AV may not know traffic light exists at new position until map updated | AV detects traffic light via camera; no update needed |
| Lane restriping | Old lane markings may be painted over but still in map; AV may follow phantom lanes | AV follows current markings; no stale data |
| Waymo’s operational response | Dedicated map monitoring team; rapid update pipeline for known construction; some geofence restrictions while map updates are processed | No operational response needed for map staleness |
| Frequency of disruptive changes in a city | Urban areas experience hundreds of construction and road-change events simultaneously at any time | — |
Waymo’s operational response to staleness is a dedicated team and pipeline: construction events are monitored, mapped areas are re-driven when significant changes occur, and geofence restrictions are applied to areas awaiting map updates. This is a functioning operational model, but it adds staffing cost and latency between a road change occurring and the AV fleet being able to safely operate in that zone again. In a high-frequency construction environment — a downtown core undergoing infrastructure renovation, for example — the map update backlog can create meaningful geographic restrictions on where the fleet can operate at any given time.
Tesla’s vision-only approach structurally eliminates staleness as an operational problem. A road change that occurs overnight is visible to the FSD cameras the next morning exactly as it is on the ground. The system does not need to be told that the construction zone opened — it sees the barriers, reads the signage, and adapts its path planning accordingly. The risk is that novel configurations (unusual temporary signage, non-standard barrier placement, edge cases outside the training distribution) may not be handled correctly. The model quality determines whether the real-time perception response is better or worse than following a stale map. Both failure modes exist; they are qualitatively different.
Section 6 — Physical AI Mapping Benchmark Scorecard
| Benchmark dimension | Waymo (HD map) | Tesla (mapless) | Benchmark implication |
|---|---|---|---|
| Geographic expansion speed | 12-36 months per new city (est.) | Weeks to months (regulatory only) | Tesla has structurally faster expansion cadence |
| Expansion cost per city | $15M-50M+ per city (est.) | Near zero map cost per city | Tesla has dramatically lower per-city cost |
| Localization precision | Centimeter-accurate in mapped areas | Landmark-dependent; less precise in ambiguous environments (est.) | Waymo more precise in mapped areas |
| Staleness resilience | Requires map refresh for significant road changes | Real-time perception always current | Tesla has structural staleness advantage |
| Map staleness risk | Active operational concern; managed by dedicated team | Structurally eliminated | Tesla eliminates a full failure mode |
| Novel city performance | Excellent in mapped area after map is complete and verified | Depends on training data coverage of that geography | Waymo more reliable in novel-geography edge cases where mapping is complete; Tesla more reliable before mapping could be completed |
| Scalability ceiling | Bounded by mapping capacity and cost; linear cost with geographic expansion | Bounded by model generalization quality; potentially near-unlimited geographic scale | Tesla’s approach has a higher theoretical scalability ceiling |
| Operational overhead | High: mapping fleet, map maintenance team, geofence management | Lower: no mapping operation required | Waymo carries a permanent operational cost structure Tesla does not |
| Safety model | Defense-in-depth: map prior plus real-time perception | Perception-only: no geometric prior to fall back on | Different failure mode profiles; neither is strictly safer in all conditions |
The mapping benchmark reveals a fundamental asymmetry in the growth economics of the two approaches. Waymo’s cost structure is approximately linear with geographic expansion: each new city requires a proportional investment in mapping, verification, and ongoing maintenance. Tesla’s cost structure has a different shape: the training investment is largely amortized across all geographies the fleet covers, and incremental geographic expansion has near-zero map cost. If the generalization quality of Tesla’s model reaches the safety threshold required for commercial operation without a map prior, the economic structure of AV expansion changes completely — the cost of being in 100 cities becomes not 20 times the cost of being in 5 cities, but a fraction more.
The critical unknown is the generalization depth question. Waymo’s argument is that the safety requirement for AV operation is so high — orders of magnitude below human fatality rates per mile, as regulatory frameworks will likely eventually demand — that no amount of training data can substitute for the local precision that HD mapping provides, at least in the near term. Tesla’s argument is that human drivers navigate novel roads without HD maps every day, and a sufficiently trained model can match and exceed human performance everywhere without mapping. The market and the regulators will resolve this debate empirically over the next decade.
What the benchmark establishes with confidence is this: HD mapping is a gating constraint on expansion speed and geographic coverage ceiling, and mapless architectures that can achieve safety parity eliminate that constraint entirely. The mapping war is therefore not just a technical debate — it is a debate about what the scalability ceiling of autonomous vehicles ultimately is, and which companies will reach it first.
Note: All figures labeled “(est.)” are derived from publicly available information, engineering estimates, and industry reporting as of mid-2026. Waymo and Tesla do not publicly disclose detailed mapping costs or timelines; estimates are directional. This article does not constitute investment advice.
Sources
- Waymo mapping operations — Waymo ↗
- Tesla FSD vision-only architecture — Tesla AI ↗
- HD map cost analysis — McKinsey autonomous driving report ↗
- OpenStreetMap foundation ↗
- AV mapping standards — SAE International ↗