Cloudflare Infire — disaggregated LLM inference beats vLLM by 20%, Unweight cuts model size 22%
Cloudflare Infire (Rust) uses disaggregated prefill/decode to beat vLLM 0.10 by 20% on H100s. Unweight achieves 15–22% lossless model weight compression.
Cloudflare Infire (Rust) uses disaggregated prefill/decode to beat vLLM 0.10 by 20% on H100s. Unweight achieves 15–22% lossless model weight compression.
Jules (Gemini 3 Pro) is now global public beta with a GitHub label Action and Jules Tools CLI — the first async GitHub-native coding agent rival to Claude Code.
Agent 365 GA at $15/user: per-agent Entra IDs, Defender MCP blocking. Agent Framework 1.0 is the open-source multi-agent baseline with A2A and MCP interop.
GR00T N1.7 VLA moves to commercial early access, removing research-only limits. Jensen Huang previewed GR00T N2 claiming 2× task success vs. current top VLAs.
Copilot in VS gains cloud agent sessions, profile-level custom agents, .claude/skills/ loading, and a Debugger agent that reproduces issues at runtime.
Cursor released a TypeScript SDK giving programmatic access to the same runtime, harness, and models powering its desktop, CLI, and web apps.
Mistral Medium 3.5: 128B dense, 256K context, 77.6% SWE-Bench Verified. Vibe gets cloud remote agents; Le Chat gets a Work Mode.
vLLM v0.20.0: 752 commits, 320 contributors. CUDA 13, PyTorch 2.11, Transformers v5, Python 3.14, FlashAttention 4 default, 2-bit KV cache.
Cursor 3.2 introduces a /multitask command that spawns parallel async subagents, expands worktrees in the Agents Window, and adds multi-root workspaces.
GPT-5.5 (codename Spud) shipped to ChatGPT and Codex Apr 23, API Apr 24. OpenAI cites Terminal-Bench 2.0 82.7% and FrontierMath gains over Opus 4.7.
Claude Design (research preview) generates prototypes, slides, and one-pagers from natural language, and reads codebases to apply design systems.
Claude Opus 4.7 ships to Claude products, API, Bedrock, Vertex, and Microsoft Foundry. Better coding, ~3x higher vision resolution, same pricing.