arxiv-scout — surfaces high-signal cs.AI papers every morning

Pulls overnight cs.AI / cs.LG submissions, ranks by builder-relevance (shipped code, reproducible results), and writes a 5-paper digest with practitioner notes.

cp .claude/agents/arxiv-scout.md ~/.claude/agents/

What this agent does

arxiv-scout runs at 06:00 ET daily. It reads the previous 18 hours of cs.AI and cs.LG submissions, ranks them on three axes (reproducibility, applicability, novelty), and produces a 5-paper digest under /papers/.

Ranking heuristics

Axis	Signal
Reproducibility	Public code repo, license type, README has runnable command, claimed numbers in abstract
Applicability	Mentions production-friendly inference (vLLM, llama.cpp, MLX, TensorRT-LLM); evaluated at ≤70B params; or shows a deployment pattern
Novelty	Not a re-derivation of prior arxiv work in the past 30 days; not a survey unless field-defining

Papers under 8.0 get logged but not published. Operator can review the rejection bin in the PR description.

Why this beats raw arXiv firehose

Builders don’t need every cs.LG paper. They need the ~3% that have shipped code and an applicable result. arxiv-scout surfaces those automatically rather than requiring 30 minutes of daily triage.

Failure modes

Author affiliation gaming. Some labs put the same result on arxiv twice with different framing. The scout dedupes by abstract-embedding similarity (>0.92) over a 30-day window.
Code repos that don’t run. The scout checks the README for an entry-point command; if missing, the paper drops 1.5 score points but isn’t disqualified.
Translated rewrites. A small fraction of arxiv submissions are translations of prior conference work. The scout cross-references with NeurIPS/ICML/ICLR/ACL accepted papers to avoid double-counting.