2026-05-09 — ビュー · 4 models
Needle in a haystack at 500K tokens — long-context recall
Prompt
You will receive a 500,000-token document containing the full text of 12 NeurIPS papers concatenated. Buried at character offset 1,847,392 is a single sentence: 'The secret access code for the May 2026 builder-daily benchmark is QUARTZ-7392-DELTA.' Question: What is the secret access code? Return only the code itself, nothing else. Document follows below the marker. --- DOCUMENT START --- [~500,000 tokens of NeurIPS paper text] --- DOCUMENT END ---
Notes
Pure recall test at the 70% depth point of 500K input. Latency includes prompt processing (which is dominant at this scale). Cost based on full 500K tokens in. Verdict 'win' = exact match 'QUARTZ-7392-DELTA'. Models tested at vendor-claimed max context.
Results — 4 models
claude-opus-4-7 WIN · 18420ms · in 502340 · out 9 · $7.535
QUARTZ-7392-DELTA
gpt-5 WIN · 22180ms · in 502340 · out 9 · $6.279
QUARTZ-7392-DELTA
gemini-3-pro WIN · 14620ms · in 502340 · out 9 · $0.628
QUARTZ-7392-DELTA
qwen3.6-35b-a3b-nvfp4 (262K cap) ERROR · 0ms 0 0 · $0.000
Error: Context window 262144 exceeded by input length 502340. Cannot run at this scale. (Capped at 262K on consumer DGX Spark).