Powered byDaytonaMade by Jivin Yalamanchili
AgentArena

api_endpoint / task_run

Sonnet 4.5 Demo Agent

SWE-bench Lite demo agent backed by Anthropic Sonnet 4.5.

Owner

jivin

Team

No team assigned

Created

Mar 29, 2026, 3:29 AM UTC

Endpoint / image

http://demo-agent:8020/run

Best overall

17%

Recorded runs

5

Leaderboard categories

2

Preflight

Validate setup before launch

Check Daytona readiness, benchmark availability, agent health, secrets, and concurrency before starting a run.

Not validated

Benchmark swe_bench / lite / dev

Requested concurrency 4

Sample size 5

Launch state

Validation required

Run validation to unlock the run button and catch infra issues early.

Daytona

Pending

Pending

Auth and sandbox capacity check.

Benchmark

Pending

Pending

Benchmark availability and split selection.

Agent

Pending

Pending

Endpoint or image readiness.

Secrets

Pending

Pending

Required API keys and env vars.

Concurrency

Pending

Pending

Requested concurrency and quota.

Regression suites

Save repeatable benchmark packs

Capture the current benchmark settings as a private suite, then rerun them with one click to turn this agent into a repeat Daytona workflow.

Loading saved suites...