Skip to content
AI-Daily-Builder

2026-05-09 조회 · 4 models

Pick the right tool — multi-step task with 6 available functions

Prompt

You have these tools available: `web_search(query)`, `fetch_url(url)`, `read_file(path)`, `run_shell(cmd)`, `query_database(sql)`, `send_email(to, subject, body)`.

Task: Find the latest published date of Anthropic's Claude 4.7 release announcement, fetch the full text of that page, count how many times the word 'tool' appears, and email the count to ops@example.com with subject 'Claude 4.7 tool count'.

Respond with ONLY a JSON array of tool calls in execution order: [{"tool": "...", "args": {...}}, ...]. No prose, no markdown, no other text.

Notes

Tests whether the model picks the right 4 tools in the right order: web_search → fetch_url → (count locally, no tool) → send_email. Penalize: extra tools, wrong order, missing send_email, prose preamble.

Results — 4 models

claude-opus-4-7 WIN · 1480ms · in 264 · out 145 · $0.013

[
  {"tool": "web_search", "args": {"query": "Anthropic Claude 4.7 release announcement date site:anthropic.com"}},
  {"tool": "fetch_url", "args": {"url": "<url-from-prior-result>"}},
  {"tool": "send_email", "args": {"to": "ops@example.com", "subject": "Claude 4.7 tool count", "body": "<count>"}}
]

gpt-5 TIE · 1320ms · in 264 · out 168 · $0.011

[
  {"tool": "web_search", "args": {"query": "Claude 4.7 release announcement Anthropic"}},
  {"tool": "fetch_url", "args": {"url": "<top result url>"}},
  {"tool": "run_shell", "args": {"cmd": "echo $TEXT | grep -o tool | wc -l"}},
  {"tool": "send_email", "args": {"to": "ops@example.com", "subject": "Claude 4.7 tool count", "body": "<count>"}}
]

gemini-3-pro LOSS · 1810ms · in 264 · out 240 · $0.009

I'll help you with that. Here's the plan: [
  {"tool": "web_search", "args": {"query": "Claude 4.7"}},
  {"tool": "web_search", "args": {"query": "Anthropic announcement"}},
  {"tool": "fetch_url", "args": {"url": "..."}},
  {"tool": "send_email", "args": {"to": "ops@example.com", "subject": "Result", "body": "..."}}
]

kimi-k2.6 WIN · 980ms · in 264 · out 138 · $0.001

[
  {"tool": "web_search", "args": {"query": "Anthropic Claude 4.7 release announcement"}},
  {"tool": "fetch_url", "args": {"url": "<from search>"}},
  {"tool": "send_email", "args": {"to": "ops@example.com", "subject": "Claude 4.7 tool count", "body": "<count>"}}
]
커피