2026-05-22 — views

Decart raises $300M Series B at $4B — Nvidia joins the world-models bet

Read this because Decart's pitch is the full stack: inference-optimization software (DOS) under real-time world models (Lucy, Oasis). The bet isn't the $4B valuation — it's that the team selling 8x token throughput also builds the world model that needs it. Vertical integration as moat.

Israeli AI startup Decart raised $300M Series B at $4B (May 18), led by Radical Ventures, Nvidia participating. DOS stack hits 1,600 tokens/sec — 8x average.

Decart — an Israeli AI startup (Tel Aviv + San Francisco) founded in 2023 — raised a $300M Series B at a $4B valuation (announced May 18), led by Radical Ventures with Nvidia participating.

The company

Founded by ex-Unit 8200 alumni Dean Leitersdorf (CEO) and Moshe Shalev, Decart sells two things that reinforce each other:

DOS — an inference-optimization stack that wrings more throughput out of existing GPUs. DOS 2.0 reportedly hits 1,600 tokens/sec — about 8x the industry average.
Real-time world models — Lucy (real-time video world model) and Oasis (physical-AI simulation), generative systems that produce interactive video/worlds on the fly.

The backer list is its own signal: alongside Radical and Nvidia, it reportedly includes Michael Eisner, Andrej Karpathy, the Yamauchi family (Nintendo), Sequoia, and Benchmark.

The actual bet: optimization + world models, same team

Most startups pick a layer. Decart is doing both — and that’s the thesis, not a distraction.

Real-time world models are brutally inference-hungry: generating interactive video frame-by-frame at low latency is one of the most demanding workloads in generative AI. Decart’s answer is to own the optimization layer (DOS) underneath its own world models (Lucy, Oasis). The 8x-throughput software isn’t a side business — it’s what makes the real-time world model economically viable.

That vertical integration is the moat claim: a competitor building world models on commodity inference pays the full compute bill; Decart’s stack is tuned end-to-end for its own workload.

Why Nvidia’s participation matters

Nvidia investing is doubly notable because DOS makes existing GPUs do more — superficially that reduces GPU demand. But Nvidia’s logic is the opposite: efficiency software that unlocks new workloads (real-time world models) expands the total addressable compute market. Cheaper-per-token doesn’t shrink GPU sales when it makes entirely new product categories viable. Nvidia is betting on demand elasticity.

Why it matters

World models are the next frontier after LLMs + video gen. Real-time, interactive, physically-grounded generation (Lucy/Oasis) is the bridge from “generate a clip” to “simulate a world” — the substrate for robotics sim, gaming, and embodied AI.
Inference efficiency is now fundable as a standalone edge. A $4B valuation partly on “we serve 8x more tokens per GPU” confirms that serving cost is a defensible moat, not just an ops detail — the same theme running through the Anthropic–Maia 200 inference talks.
Israeli deep-tech + ex-8200 founders remain a magnet for tier-1 capital even in a selective funding environment.

Practitioner note

The number to interrogate is “1,600 tokens/sec, 8x industry average.” Throughput claims are workload- and model-size-specific. Before extrapolating, ask: which model, which batch size, which precision, which hardware. The headline is real but the comparison base matters.
World models ≠ video generation. Lucy generates interactive worlds (you can act and the model responds), which is categorically harder than text-to-video. If you’re evaluating the space, separate “real-time interactive” from “offline clip generation” — they’re different cost and capability regimes.
Optimization-as-moat is the transferable lesson. Whether or not you care about world models, the structural insight applies: when inference cost gates your product, owning the serving-efficiency layer is leverage. Decart productized that into a $4B story.

The under-considered angle: the smart money is converging on “the bottleneck is inference, and whoever owns inference efficiency owns the next product category.” Anthropic diversifying silicon for cheaper inference, Decart raising $4B on 8x token throughput, Nvidia funding the efficiency software that “should” hurt GPU sales — three stories, one thesis. The 2026 AI edge isn’t a bigger model; it’s serving the model cheaply enough to do something new with it.