2026-06-18 — views
Physical AI Compute — Edge vs Cloud: Tesla FSD Chip vs Waymo Custom ASIC vs Dojo
Edge inference vs cloud training: how Tesla FSD chip, Waymo custom ASIC, and Dojo supercomputer divide AV compute across the full stack.
Article 57 in the Physical AI Benchmark Series — The Full Compute Stack
Every time a Tesla running FSD detects a pedestrian stepping off a curb, the compute that enables that detection runs entirely onboard — inside a custom chip bolted behind the dashboard, drawing about 100 watts, with no connection to Tesla’s servers. And yet the neural network weights loaded onto that chip were trained on thousands of GPU-years of compute in Tesla’s cloud infrastructure. The two halves of the problem — inference and training — require fundamentally different compute architectures, and the choices each AV company has made about both halves will shape how they compete over the next decade.
This article maps the full compute stack: what runs onboard at the edge, what runs in the cloud, and the custom silicon each company built to win.
Section 1 — Why Edge Compute Is Non-Negotiable for AVs
The fundamental architecture of any autonomous vehicle is dictated by an irreducible physical constraint: decisions that must happen in milliseconds cannot wait for a server that is hundreds of miles away.
| Constraint | Detail |
|---|---|
| Latency requirement | An AV must perceive, plan, and actuate in under 100ms total loop (est.); a cloud round-trip adds 20–100ms of network latency alone — unacceptable for safety-critical decisions |
| Connectivity reliability | 4G/5G networks have dead zones, congestion, and outages; an AV that requires connectivity to drive safely is not deployable at commercial scale |
| Data bandwidth | 8 cameras plus LIDAR plus radar generate 1–2 TB/hour of raw sensor data (est.); streaming all of this to the cloud in real time is not feasible on any current wireless standard |
| Regulatory | Most AV safety frameworks require onboard fail-operational capability — the vehicle must be able to bring itself to a safe stop without any external connection |
These constraints produce a principle that governs every serious AV engineering team: inference happens at the edge; training happens in the cloud. The vehicle runs a cloud-trained model locally, sends curated clips of edge cases back to the cloud for retraining, and receives model updates over-the-air periodically. The intelligence lives in the weights. The weights live in the cloud training pipeline. But the computation that applies those weights to every camera frame — that happens onboard, in dedicated silicon, faster than any human blink.
The architecture question is therefore not whether to use edge compute — every AV must — but which edge silicon to build or buy, and how to design the cloud training infrastructure that feeds it.
Section 2 — Tesla’s Edge Compute: The FSD Chip
Tesla made the most consequential edge silicon bet in the automotive industry when it decided in 2016 to design its own neural processing hardware rather than rely on a supplier. The result is the Tesla FSD Computer, a purpose-built accelerator that runs every FSD inference task onboard every Tesla with the capability enabled.
| Component | Detail |
|---|---|
| Chip name | Tesla FSD Computer (HW3: 2019, HW4: 2023) |
| Architecture | Custom neural processing units (NPUs) designed by Tesla’s in-house silicon team, led by Pete Bannon, formerly of Apple’s chip group |
| HW4 specs | Dual-chip design; each chip carries 12 ARM Cortex-A77 cores, 2 NPUs, and a GPU; approximately 100 TOPS per chip, approximately 200 TOPS combined (est.) |
| Power consumption | Approximately 100W total for the FSD Computer system (est.) |
| Redundancy | Dual-chip design provides hardware redundancy; fail-operational architecture means one chip can sustain operation if the other fails |
| Memory | HBM2 (High Bandwidth Memory) for fast neural network weight access during inference |
| What runs on it | All FSD inference: camera processing, occupancy network, neural planner, velocity controller — the complete end-to-end pipeline |
| Over-the-air updates | Model weights updated OTA via Tesla’s cellular connection; each new FSD software version pushes updated neural net weights to the chip |
| HW5 (est.) | Next-generation chip expected; likely substantially higher TOPS for FSD v14 and later models |
The strategic logic of designing the chip in-house is the same logic Apple applied to the M-series: when you own the neural network topology, you can co-optimize the chip architecture to accelerate the exact matrix operations your network requires. A general-purpose GPU from NVIDIA or Qualcomm is designed to run anyone’s neural network efficiently. Tesla’s NPU is designed to run Tesla’s specific neural network as efficiently as physically possible. That specificity translates to better performance per watt at a given task — which matters enormously in a vehicle where power is constrained and thermal management affects passenger comfort.
The cost of this bet is execution risk. Designing a world-class inference chip requires a team with deep expertise in computer architecture, memory subsystems, and chip fabrication — a capability that even most large technology companies do not possess. Tesla has built that team, and HW4 demonstrates that it can execute. The pending test is whether HW5 can continue to track the rapid pace of neural network scaling that FSD’s increasing model complexity will demand.
Section 3 — Waymo’s Edge Compute: Custom ASIC Plus Orin
Waymo’s onboard compute problem is structurally harder than Tesla’s. Tesla’s sensor suite is cameras only — no LIDAR, no radar. Waymo’s sensor suite combines LIDAR, cameras, and radar, each generating different data types at high frequency, all of which must be processed, fused, and interpreted in real time. The result is a more complex onboard compute stack that draws more power and occupies more space.
| Component | Detail |
|---|---|
| Primary inference chip | Waymo has designed custom ASICs for sensor processing; LIDAR point cloud processing at 10–20 Hz requires dedicated hardware; NVIDIA Orin SoC is used for general neural network inference (est.) |
| LIDAR processing | 360-degree LIDAR point cloud at high frequency requires dedicated compute for point cloud segmentation and object detection; this workload does not map efficiently onto general-purpose GPU architectures |
| Sensor fusion | Fusing LIDAR, camera, and radar data streams in real time is significantly more compute-intensive than camera-only processing; the fusion step must run before the neural network planner can operate |
| HD map localization | Matching a live LIDAR point cloud against a stored HD map in real time requires additional dedicated compute beyond the perception pipeline |
| Total onboard compute | Significantly more than Tesla (est.) due to LIDAR and radar processing requirements; Waymo has not publicly disclosed TOPS figures |
| Power consumption | Higher than Tesla (est.) due to LIDAR hardware plus radar hardware plus additional compute; thermal management is a recognized engineering challenge |
| Gen 6 vehicle | Waymo’s purpose-built Gen 6 vehicle integrates sensor and compute hardware from the ground up, reducing the retrofit overhead that characterized earlier generations |
The architectural contrast is telling. Tesla’s edge compute is an inference accelerator: a chip optimized to run one neural network as fast and efficiently as possible. Waymo’s edge compute is a full signal processing pipeline: custom hardware for point cloud processing, general-purpose SoC for neural inference, and additional compute for map matching — each stage feeding the next. The additional compute gives Waymo more raw sensor information per decision cycle. The cost is higher system complexity, more power draw, and a compute stack that is harder to upgrade incrementally via OTA software updates alone.
Section 4 — Cloud Training: Dojo vs Google TPU
The edge compute determines what the car can do today. The cloud training infrastructure determines how fast the car improves tomorrow.
| Tesla Dojo | Waymo (Google TPU) | |
|---|---|---|
| Training hardware | Custom Dojo D1 chip plus ExaPOD cluster; each D1 chip delivers approximately 50 TFLOPS at BF16 precision with 10 TB/s memory bandwidth | Google TPU v4/v5 pods; Waymo is an Alphabet company and has access to Google’s full TPU fleet |
| Cluster scale | Tesla targeting approximately 1 ExaFLOP of AI training compute (est., late 2025); Dojo 2 announced for further scaling | Google’s TPU fleet is among the largest AI training clusters in the world; Waymo has effectively unlimited on-demand access (est.) |
| Training data pipeline | Approximately 6 million FSD-capable Tesla vehicles generate clips via shadow mode; clips flagged by the network as edge cases are prioritized for upload and labeling | Dedicated mapping vehicles plus a robotaxi fleet of approximately 1,500 vehicles; smaller dataset but higher proportion of fully driverless miles |
| Training objective | Imitation learning from human driver video (FSD v12+): minimize divergence between neural net output and what a human driver would do | Multi-task training across object detection, occupancy prediction, trajectory prediction, and behavior prediction; separate models or multi-task architecture (est.) |
| Model update frequency | FSD updates every few weeks via OTA; each update retrains on accumulated edge case data | Waymo does not disclose update frequency; continuous improvement model (est.) |
| Key advantage | End-to-end control of the training pipeline; faster iteration; no cloud vendor dependency or egress cost | On-demand scale to Google’s full TPU capacity; no capital expenditure on training hardware |
| Key risk | Custom silicon is a concentrated bet; if Dojo underperforms relative to NVIDIA alternatives, training throughput falls behind | No hardware risk; Google TPU is production-proven at scale; risk is data volume relative to Tesla |
The Dojo bet deserves particular scrutiny because it is unusual even by the standards of a company willing to design its own inference chip. Building a custom training supercomputer requires a completely different engineering discipline than building a custom inference chip. The optimization targets are different (throughput at cluster scale versus latency at chip level), the cooling and power infrastructure is different (megawatts versus watts), and the software stack is different (distributed training frameworks versus embedded inference engines). Tesla is attempting to win at both ends of the compute stack simultaneously — a wager that no other AV company has made.
The Google TPU advantage available to Waymo is the inverse of this bet. Waymo pays nothing for training hardware capital expenditure. When a new model architecture requires twice the training compute, Waymo schedules more TPU time. When training demand is low, it does not pay for idle racks. The flexibility is substantial. The dependency is real: Alphabet controls the training infrastructure, and any strategic divergence between Waymo and Google could become a supply chain problem. In practice, as a wholly owned Alphabet subsidiary, this risk is low.
Section 5 — The Fleet Data Loop: How Training and Deployment Connect
The compute architecture — edge inference chip, cloud training cluster — exists in service of a data flywheel that determines how quickly each system improves.
Fleet vehicles run edge inference
→ curated interesting clips uploaded to cloud
→ cloud training on new data (Dojo / Google TPU)
→ improved model weights produced
→ OTA update pushed to fleet
→ fleet performance improves
→ better data clips → more effective next training cycle
| Flywheel component | Tesla | Waymo |
|---|---|---|
| Data volume | Approximately 6 million FSD-capable vehicles; tens of millions of fleet miles per week | Approximately 1,500 vehicles; 150,000+ driverless rides per week |
| Data quality | Mostly supervised miles (human driver present); human interventions mark genuine edge cases | Fully driverless miles; no human driver to bail out — every decision is system-generated |
| Upload bandwidth | Cellular connection; selective upload of clips flagged as unusual by the onboard network | Dedicated upload from known garage and depot locations (est.) |
| Training throughput | Dojo scales with capital investment; Tesla controls the pace | Google TPU scales on-demand; Waymo can surge capacity without new hardware |
| Deployment latency | OTA to approximately 6 million vehicles within days of a new model release | OTA to approximately 1,500 vehicles within hours |
| Compounding rate | More vehicles generate more edge cases; data volume compounds with fleet size | More driverless miles generate harder edge cases; data quality compounds with operational confidence |
The asymmetry in this flywheel is the central strategic tension of the AV industry. Tesla has an enormous data volume advantage — six million vehicles versus fifteen hundred. But Waymo has a data quality advantage: every mile in its dataset was driven without a human ready to take over, meaning the system’s own decisions (including its mistakes) are fully represented. A Tesla dataset clip where a human intervened and corrected the system is informative about what the system got wrong. But the system never learns what would have happened if the human had not intervened.
Whether data volume or data quality matters more is not yet empirically settled. Tesla’s imitation learning approach from FSD v12 onward treats human correction as the training signal — making the human intervention itself the label. Waymo’s closed-loop approach treats the system’s own behavior as the primary source of both training signal and safety validation. Both are defensible engineering choices. The answer will be revealed by safety records measured over billions of miles.
Sources: Tesla FSD Computer and Dojo specifications — tesla.com/AI (Tesla AI Day 2022, 2023); NVIDIA Orin SoC for automotive — nvidia.com/en-us/self-driving-cars/drive-orin/; Google Cloud TPU fleet documentation — cloud.google.com/tpu; Waymo technology overview — waymo.com/waymo-driver/. All figures marked (est.) are estimates derived from public company materials, industry reporting, and analyst research. They have not been independently verified and should be treated as directional. This article does not constitute investment advice.
Sources
- Tesla FSD Computer HW4 specs — Tesla AI Day 2022 ↗
- Tesla Dojo supercomputer — Tesla AI infrastructure ↗
- NVIDIA Orin SoC automotive compute — NVIDIA ↗
- Google TPU fleet — Google Cloud ↗