2026-05-28 — views

Nvidia, AMD, and CoreWeave all backed Tensormesh — KV-cache reuse becomes an inference primitive

Read this because Three rivals — Nvidia, AMD, CoreWeave — co-investing is the tell: KV-cache reuse (don't recompute what you already computed) is being treated as a foundational, neutral inference-stack layer. The economics of the inference era, in one round.

Tensormesh raised $20M from Nvidia, AMD, and CoreWeave and shipped Tensormesh Inference — productized KV-cache reuse claiming up to 10x lower latency/GPU cost.

On May 27, 2026, Tensormesh announced $20M in new funding — a seed extension bringing total funding to $24.5M — from a notably aligned set of backers: AMD Ventures, CoreWeave, and NVentures (Nvidia’s venture arm), plus Valley Capital Partners and Laude Ventures. Alongside it, the company shipped the general availability of Tensormesh Inference.

The problem: paying twice for the same compute

Tensormesh’s pitch targets what it calls enterprises’ most expensive AI problem — recomputing what GPUs have already processed. In transformer inference, the KV cache (the key/value tensors a model builds while reading a prompt) is normally discarded between requests, so shared context — a long system prompt, a document, a conversation history — gets recomputed from scratch every time. Tensormesh stores and reuses those computed results, eliminating the redundant work and claiming up to 10x reductions in latency and GPU spend.

It ships with a Cost Savings Dashboard that makes the financial impact visible in real time: it tracks the cache hit rate (cached vs. total prompt tokens) and converts it into a continuously-updated dollar figure, rather than asking teams to take the savings on faith.

Why the cap table is the story

The striking part isn’t the round size — it’s who wrote the checks. Nvidia, AMD, and CoreWeave are competitors across silicon and cloud, and they co-invested in the same startup. That kind of alignment signals a shared conviction: KV-cache reuse is a foundational layer of the inference stack, not a winner-take-all product any one of them wants to own exclusively. Each benefits if inference gets cheaper and stickier on its hardware — so a neutral, productized caching layer is a rising tide. The funds go to product, hardware-level integrations with all three, and open-source contributions.

Why it matters

This is the “year of inference” economics crystallized into one round. As models move from demos to production, the cost center shifts from training to serving, and the cheapest token is the one you never recompute. Caching, routing, and quantization are becoming the real margin levers — and capital is now flowing to the plumbing between models and users, not just to the models themselves. A startup turning KV caching into enterprise infrastructure, funded by the three companies whose hardware it runs on, is a clear marker of where the value is migrating.

Practitioner note

If you serve LLMs at any volume, measure your cache hit rate before you do anything else — it’s the single number that predicts how much a reuse layer can save you. Workloads with heavy shared prefixes (RAG over the same corpus, long fixed system prompts, multi-turn chat) are where 10x-class wins are plausible; workloads with mostly-unique prompts will see far less. And weigh the isolation question: a cache that reuses computed state across requests needs hard tenant boundaries, or you risk one user’s context bleeding into another’s. The savings are real, but “shared cache” and “data isolation” are a tradeoff you must configure deliberately, not inherit by default.

The under-considered angle

If the inference layer commoditizes around shared primitives like KV caching, the competitive frontier moves up a level — to who orchestrates caching, routing, and quantization together into the cheapest reliable token. That’s the same thesis playing out in model routing: the model is increasingly a swappable backend, and the durable businesses are the ones optimizing the delivery of intelligence. Tensormesh, backed by the hardware vendors themselves, is a bet that this plumbing layer is large enough to be a company — and central enough that even rivals want a seat.