Cloudflare Infire — disaggregated LLM inference beats vLLM by 20%, Unweight cuts model size 22%
Cloudflare Infire (Rust) uses disaggregated prefill/decode to beat vLLM 0.10 by 20% on H100s. Unweight achieves 15–22% lossless model weight compression.
Cloudflare Infire (Rust) uses disaggregated prefill/decode to beat vLLM 0.10 by 20% on H100s. Unweight achieves 15–22% lossless model weight compression.
vLLM v0.20.0: 752 commits, 320 contributors. CUDA 13, PyTorch 2.11, Transformers v5, Python 3.14, FlashAttention 4 default, 2-bit KV cache.