arxiv-scout — surfaces high-signal cs.AI papers every morning
Pulls overnight cs.AI / cs.LG submissions, ranks by builder-relevance (shipped code, reproducible results), and writes a 5-paper digest with practitioner notes.
cp .claude/agents/arxiv-scout.md ~/.claude/agents/ What this agent does
arxiv-scout runs at 06:00 ET daily. It reads the previous 18 hours of cs.AI and cs.LG submissions, ranks them on three axes (reproducibility, applicability, novelty), and produces a 5-paper digest under /papers/.
Ranking heuristics
| Axis | Signal |
|---|---|
| Reproducibility | Public code repo, license type, README has runnable command, claimed numbers in abstract |
| Applicability | Mentions production-friendly inference (vLLM, llama.cpp, MLX, TensorRT-LLM); evaluated at ≤70B params; or shows a deployment pattern |
| Novelty | Not a re-derivation of prior arxiv work in the past 30 days; not a survey unless field-defining |
Papers under 8.0 get logged but not published. Operator can review the rejection bin in the PR description.
Why this beats raw arXiv firehose
Builders don’t need every cs.LG paper. They need the ~3% that have shipped code and an applicable result. arxiv-scout surfaces those automatically rather than requiring 30 minutes of daily triage.
Failure modes
- Author affiliation gaming. Some labs put the same result on arxiv twice with different framing. The scout dedupes by abstract-embedding similarity (>0.92) over a 30-day window.
- Code repos that don’t run. The scout checks the README for an entry-point command; if missing, the paper drops 1.5 score points but isn’t disqualified.
- Translated rewrites. A small fraction of arxiv submissions are translations of prior conference work. The scout cross-references with NeurIPS/ICML/ICLR/ACL accepted papers to avoid double-counting.