Skip to content
AI-Daily-Builder

2026-05-21 views

Anthropic Code with Claude London: agent platform grows up — Dreaming, Outcomes, Finance

Read this because The theme: a shift from "better model" to "reliable autonomy." Outcomes (a grader loop scoring agent runs) and Dreaming (scheduled memory curation) are the infra for agents you can leave running unattended — the real enterprise blocker, not model IQ.

At Code with Claude London, Anthropic shipped Dreaming, Outcomes, multi-agent orchestration, a 10-agent Claude Finance suite, and Small Business integrations.

Anthropic took its Code with Claude developer event to London (May 20-21) and used it to ship the parts of the agent platform that matter for production — not a new flagship model, but the reliability scaffolding around agents.

The 5 agent features

FeatureWhat it does
Dreaming (research preview)A scheduled process that reviews past agent sessions + memory stores, extracts patterns, and curates long-term memory
Outcomes (public beta)A grader loop that scores an agent’s runs against defined success criteria — closing the “did the agent actually succeed?” gap
Multi-agent orchestrationCoordinating multiple specialized agents on one task
Claude FinanceA suite of 10 finance-specific agents
Add-insExtending Claude into existing application surfaces

Plus Claude for Small Business — pre-built integrations with QuickBooks, PayPal, HubSpot, Canva, Docusign, Google Workspace, and Microsoft 365 — packaging agent capability for non-technical operators.

All of this runs on Claude Opus 4.7 (the model that took the coding-benchmark lead earlier this spring, ~+13% over Opus 4.6 on a 93-task coding suite).

The real theme: autonomy reliability, not model IQ

The under-appreciated shift: Anthropic isn’t selling a smarter model here — it’s selling the infrastructure that makes agents trustworthy enough to leave running unattended.

Together they target the gap between “demo that works once” and “agent you can deploy in production and walk away from.”

Why this matters

Practitioner note

For builders shipping on Claude:

The under-considered angle: the agent platform war is being won on reliability tooling, not model benchmarks. Outcomes and Dreaming are unglamorous — graders and memory curation don’t make headlines like a new model does. But they’re exactly what converts agent demos into deployed, unattended production systems. The lab that makes agents boring and reliable first wins the enterprise, regardless of who tops the next benchmark.


Sources

Tags

Tip