Bernstein: a deterministic Python orchestrator that runs 44 CLI coding agents in parallel git worktrees with an HMAC-chained audit log

— views

Bernstein is an Apache-2.0 Python scheduler that decomposes a goal, spawns CLI coding agents (Claude Code, Codex, Gemini CLI, +40 more) into isolated git worktrees, verifies each diff against tests/lint/types, and merges only what passes -- all coordinated by plain Python, not

pipx install bernstein

What it is

Bernstein is an open-source (Apache-2.0) Python tool that orchestrates other CLI coding agents instead of being one itself. You give it a goal; one LLM call decomposes that goal into tasks with owned files and completion signals, and from there the scheduler is plain deterministic Python. It spawns agents — Claude Code, OpenAI Codex, GitHub Copilot CLI, Google Gemini CLI, Cursor, Aider, Continue, and others (44 adapters as of v2.7.0, plus a generic --prompt wrapper) — each into its own git worktree, runs them in parallel, then a “janitor” stage checks concrete signals (tests pass, lint clean, types correct) before any work merges to main. Failed tasks retry or get routed to a different model.

The pitch that makes it distinct from the crowded agent-framework space is audit-grade determinism: scheduling decisions spend zero LLM tokens, every step is replayable, and each decision is written to an HMAC-SHA256 audit log at .sdd/audit/YYYY-MM-DD.jsonl. The README leans into compliance, signed agent cards, per-artifact lineage, and air-gapped/on-prem deploy as the differentiators.

Install and run

pipx install bernstein            # also: pip install bernstein (Python >= 3.12)
cd your-project
bernstein init                    # creates the .sdd/ workspace
bernstein -g "Add rate limiting"  # agents spawn, work in parallel, verify, exit
bernstein live                    # optional TUI dashboard to watch progress

A representative run summary from the docs:

[manager] decomposed into 4 tasks
[agent-1] claude-sonnet: src/auth/middleware.py  (done, 2m 14s)
[agent-2] codex:         tests/test_auth.py      (done, 1m 58s)
[verify]  all gates pass. merging to main.

When to use it

Reach for Bernstein when you have a parallelizable change across several files, you want more than one model/agent in the mix, and you need a verifiable record of who-changed-what (regulated codebases, forward-deployed/air-gapped engineering, or just cost-conscious teams that don’t want an LLM burning tokens on coordination). It is explicitly not the right tool for chatting with a single pair-programmer, and not for non-coding LLM workflows.

Property	Value
Latest release	v2.7.0 (2026-05-24)
Language / license	Python, Apache-2.0
Author	Alex Chernysh (sipyourdrink-ltd)
Install	pipx install bernstein
Agents supported	44 CLI adapters + generic —prompt wrapper

Caveat

It is a wrapper, not a self-contained agent: the underlying agent CLIs must already be installed and authenticated on the machine, and isolation is built on git worktree, so a non-git project simply won’t run. There is no SaaS option by design — it is on-premise only.

Practitioner note: the highest-leverage knob is task decomposition, not agent choice. Because Bernstein assigns files to tasks deterministically and isolates each in a worktree, a goal phrased so tasks own disjoint files merges cleanly and in parallel; a goal that forces two agents to edit the same module will serialize or generate merge friction no audit log can smooth over.

Under-considered angle: most “multi-agent” marketing implies an LLM is reasoning about coordination, which adds cost, latency, and non-reproducibility. Bernstein’s quiet bet is the opposite — that orchestration should be boring, deterministic code, and the model’s only judgment call is the one upfront decomposition. That framing turns “multi-agent” from a reliability liability into an auditable build step, and it is a useful lens for evaluating any orchestrator: ask how many of its scheduling decisions are model calls versus code, because every model-in-the-loop decision is a place your run can silently diverge between two executions.