Skip to content
AI-Daily-Builder

2026-06-18 views

Physical AI Compute — Edge vs Cloud: Tesla FSD Chip vs Waymo Custom ASIC vs Dojo

Edge inference vs cloud training: how Tesla FSD chip, Waymo custom ASIC, and Dojo supercomputer divide AV compute across the full stack.

Article 57 in the Physical AI Benchmark Series — The Full Compute Stack

Every time a Tesla running FSD detects a pedestrian stepping off a curb, the compute that enables that detection runs entirely onboard — inside a custom chip bolted behind the dashboard, drawing about 100 watts, with no connection to Tesla’s servers. And yet the neural network weights loaded onto that chip were trained on thousands of GPU-years of compute in Tesla’s cloud infrastructure. The two halves of the problem — inference and training — require fundamentally different compute architectures, and the choices each AV company has made about both halves will shape how they compete over the next decade.

This article maps the full compute stack: what runs onboard at the edge, what runs in the cloud, and the custom silicon each company built to win.


Section 1 — Why Edge Compute Is Non-Negotiable for AVs

The fundamental architecture of any autonomous vehicle is dictated by an irreducible physical constraint: decisions that must happen in milliseconds cannot wait for a server that is hundreds of miles away.

ConstraintDetail
Latency requirementAn AV must perceive, plan, and actuate in under 100ms total loop (est.); a cloud round-trip adds 20–100ms of network latency alone — unacceptable for safety-critical decisions
Connectivity reliability4G/5G networks have dead zones, congestion, and outages; an AV that requires connectivity to drive safely is not deployable at commercial scale
Data bandwidth8 cameras plus LIDAR plus radar generate 1–2 TB/hour of raw sensor data (est.); streaming all of this to the cloud in real time is not feasible on any current wireless standard
RegulatoryMost AV safety frameworks require onboard fail-operational capability — the vehicle must be able to bring itself to a safe stop without any external connection

These constraints produce a principle that governs every serious AV engineering team: inference happens at the edge; training happens in the cloud. The vehicle runs a cloud-trained model locally, sends curated clips of edge cases back to the cloud for retraining, and receives model updates over-the-air periodically. The intelligence lives in the weights. The weights live in the cloud training pipeline. But the computation that applies those weights to every camera frame — that happens onboard, in dedicated silicon, faster than any human blink.

The architecture question is therefore not whether to use edge compute — every AV must — but which edge silicon to build or buy, and how to design the cloud training infrastructure that feeds it.


Section 2 — Tesla’s Edge Compute: The FSD Chip

Tesla made the most consequential edge silicon bet in the automotive industry when it decided in 2016 to design its own neural processing hardware rather than rely on a supplier. The result is the Tesla FSD Computer, a purpose-built accelerator that runs every FSD inference task onboard every Tesla with the capability enabled.

ComponentDetail
Chip nameTesla FSD Computer (HW3: 2019, HW4: 2023)
ArchitectureCustom neural processing units (NPUs) designed by Tesla’s in-house silicon team, led by Pete Bannon, formerly of Apple’s chip group
HW4 specsDual-chip design; each chip carries 12 ARM Cortex-A77 cores, 2 NPUs, and a GPU; approximately 100 TOPS per chip, approximately 200 TOPS combined (est.)
Power consumptionApproximately 100W total for the FSD Computer system (est.)
RedundancyDual-chip design provides hardware redundancy; fail-operational architecture means one chip can sustain operation if the other fails
MemoryHBM2 (High Bandwidth Memory) for fast neural network weight access during inference
What runs on itAll FSD inference: camera processing, occupancy network, neural planner, velocity controller — the complete end-to-end pipeline
Over-the-air updatesModel weights updated OTA via Tesla’s cellular connection; each new FSD software version pushes updated neural net weights to the chip
HW5 (est.)Next-generation chip expected; likely substantially higher TOPS for FSD v14 and later models

The strategic logic of designing the chip in-house is the same logic Apple applied to the M-series: when you own the neural network topology, you can co-optimize the chip architecture to accelerate the exact matrix operations your network requires. A general-purpose GPU from NVIDIA or Qualcomm is designed to run anyone’s neural network efficiently. Tesla’s NPU is designed to run Tesla’s specific neural network as efficiently as physically possible. That specificity translates to better performance per watt at a given task — which matters enormously in a vehicle where power is constrained and thermal management affects passenger comfort.

The cost of this bet is execution risk. Designing a world-class inference chip requires a team with deep expertise in computer architecture, memory subsystems, and chip fabrication — a capability that even most large technology companies do not possess. Tesla has built that team, and HW4 demonstrates that it can execute. The pending test is whether HW5 can continue to track the rapid pace of neural network scaling that FSD’s increasing model complexity will demand.


Section 3 — Waymo’s Edge Compute: Custom ASIC Plus Orin

Waymo’s onboard compute problem is structurally harder than Tesla’s. Tesla’s sensor suite is cameras only — no LIDAR, no radar. Waymo’s sensor suite combines LIDAR, cameras, and radar, each generating different data types at high frequency, all of which must be processed, fused, and interpreted in real time. The result is a more complex onboard compute stack that draws more power and occupies more space.

ComponentDetail
Primary inference chipWaymo has designed custom ASICs for sensor processing; LIDAR point cloud processing at 10–20 Hz requires dedicated hardware; NVIDIA Orin SoC is used for general neural network inference (est.)
LIDAR processing360-degree LIDAR point cloud at high frequency requires dedicated compute for point cloud segmentation and object detection; this workload does not map efficiently onto general-purpose GPU architectures
Sensor fusionFusing LIDAR, camera, and radar data streams in real time is significantly more compute-intensive than camera-only processing; the fusion step must run before the neural network planner can operate
HD map localizationMatching a live LIDAR point cloud against a stored HD map in real time requires additional dedicated compute beyond the perception pipeline
Total onboard computeSignificantly more than Tesla (est.) due to LIDAR and radar processing requirements; Waymo has not publicly disclosed TOPS figures
Power consumptionHigher than Tesla (est.) due to LIDAR hardware plus radar hardware plus additional compute; thermal management is a recognized engineering challenge
Gen 6 vehicleWaymo’s purpose-built Gen 6 vehicle integrates sensor and compute hardware from the ground up, reducing the retrofit overhead that characterized earlier generations

The architectural contrast is telling. Tesla’s edge compute is an inference accelerator: a chip optimized to run one neural network as fast and efficiently as possible. Waymo’s edge compute is a full signal processing pipeline: custom hardware for point cloud processing, general-purpose SoC for neural inference, and additional compute for map matching — each stage feeding the next. The additional compute gives Waymo more raw sensor information per decision cycle. The cost is higher system complexity, more power draw, and a compute stack that is harder to upgrade incrementally via OTA software updates alone.


Section 4 — Cloud Training: Dojo vs Google TPU

The edge compute determines what the car can do today. The cloud training infrastructure determines how fast the car improves tomorrow.

Tesla DojoWaymo (Google TPU)
Training hardwareCustom Dojo D1 chip plus ExaPOD cluster; each D1 chip delivers approximately 50 TFLOPS at BF16 precision with 10 TB/s memory bandwidthGoogle TPU v4/v5 pods; Waymo is an Alphabet company and has access to Google’s full TPU fleet
Cluster scaleTesla targeting approximately 1 ExaFLOP of AI training compute (est., late 2025); Dojo 2 announced for further scalingGoogle’s TPU fleet is among the largest AI training clusters in the world; Waymo has effectively unlimited on-demand access (est.)
Training data pipelineApproximately 6 million FSD-capable Tesla vehicles generate clips via shadow mode; clips flagged by the network as edge cases are prioritized for upload and labelingDedicated mapping vehicles plus a robotaxi fleet of approximately 1,500 vehicles; smaller dataset but higher proportion of fully driverless miles
Training objectiveImitation learning from human driver video (FSD v12+): minimize divergence between neural net output and what a human driver would doMulti-task training across object detection, occupancy prediction, trajectory prediction, and behavior prediction; separate models or multi-task architecture (est.)
Model update frequencyFSD updates every few weeks via OTA; each update retrains on accumulated edge case dataWaymo does not disclose update frequency; continuous improvement model (est.)
Key advantageEnd-to-end control of the training pipeline; faster iteration; no cloud vendor dependency or egress costOn-demand scale to Google’s full TPU capacity; no capital expenditure on training hardware
Key riskCustom silicon is a concentrated bet; if Dojo underperforms relative to NVIDIA alternatives, training throughput falls behindNo hardware risk; Google TPU is production-proven at scale; risk is data volume relative to Tesla

The Dojo bet deserves particular scrutiny because it is unusual even by the standards of a company willing to design its own inference chip. Building a custom training supercomputer requires a completely different engineering discipline than building a custom inference chip. The optimization targets are different (throughput at cluster scale versus latency at chip level), the cooling and power infrastructure is different (megawatts versus watts), and the software stack is different (distributed training frameworks versus embedded inference engines). Tesla is attempting to win at both ends of the compute stack simultaneously — a wager that no other AV company has made.

The Google TPU advantage available to Waymo is the inverse of this bet. Waymo pays nothing for training hardware capital expenditure. When a new model architecture requires twice the training compute, Waymo schedules more TPU time. When training demand is low, it does not pay for idle racks. The flexibility is substantial. The dependency is real: Alphabet controls the training infrastructure, and any strategic divergence between Waymo and Google could become a supply chain problem. In practice, as a wholly owned Alphabet subsidiary, this risk is low.


Section 5 — The Fleet Data Loop: How Training and Deployment Connect

The compute architecture — edge inference chip, cloud training cluster — exists in service of a data flywheel that determines how quickly each system improves.

Fleet vehicles run edge inference
    → curated interesting clips uploaded to cloud
    → cloud training on new data (Dojo / Google TPU)
    → improved model weights produced
    → OTA update pushed to fleet
    → fleet performance improves
    → better data clips → more effective next training cycle
Flywheel componentTeslaWaymo
Data volumeApproximately 6 million FSD-capable vehicles; tens of millions of fleet miles per weekApproximately 1,500 vehicles; 150,000+ driverless rides per week
Data qualityMostly supervised miles (human driver present); human interventions mark genuine edge casesFully driverless miles; no human driver to bail out — every decision is system-generated
Upload bandwidthCellular connection; selective upload of clips flagged as unusual by the onboard networkDedicated upload from known garage and depot locations (est.)
Training throughputDojo scales with capital investment; Tesla controls the paceGoogle TPU scales on-demand; Waymo can surge capacity without new hardware
Deployment latencyOTA to approximately 6 million vehicles within days of a new model releaseOTA to approximately 1,500 vehicles within hours
Compounding rateMore vehicles generate more edge cases; data volume compounds with fleet sizeMore driverless miles generate harder edge cases; data quality compounds with operational confidence

The asymmetry in this flywheel is the central strategic tension of the AV industry. Tesla has an enormous data volume advantage — six million vehicles versus fifteen hundred. But Waymo has a data quality advantage: every mile in its dataset was driven without a human ready to take over, meaning the system’s own decisions (including its mistakes) are fully represented. A Tesla dataset clip where a human intervened and corrected the system is informative about what the system got wrong. But the system never learns what would have happened if the human had not intervened.

Whether data volume or data quality matters more is not yet empirically settled. Tesla’s imitation learning approach from FSD v12 onward treats human correction as the training signal — making the human intervention itself the label. Waymo’s closed-loop approach treats the system’s own behavior as the primary source of both training signal and safety validation. Both are defensible engineering choices. The answer will be revealed by safety records measured over billions of miles.


Sources: Tesla FSD Computer and Dojo specifications — tesla.com/AI (Tesla AI Day 2022, 2023); NVIDIA Orin SoC for automotive — nvidia.com/en-us/self-driving-cars/drive-orin/; Google Cloud TPU fleet documentation — cloud.google.com/tpu; Waymo technology overview — waymo.com/waymo-driver/. All figures marked (est.) are estimates derived from public company materials, industry reporting, and analyst research. They have not been independently verified and should be treated as directional. This article does not constitute investment advice.


Sources

Tags

Tip