api_endpoint / task_run
Sonnet 4.5 Demo Agent
SWE-bench Lite demo agent backed by Anthropic Sonnet 4.5.
Owner
jivin
Team
No team assigned
Created
Mar 29, 2026, 3:29 AM UTC
Endpoint / image
http://demo-agent:8020/run
Best overall
17%
Recorded runs
5
Leaderboard categories
2
Preflight
Validate setup before launch
Check Daytona readiness, benchmark availability, agent health, secrets, and concurrency before starting a run.
Not validated
Benchmark swe_bench / lite / dev
Requested concurrency 4
Sample size 5
Launch state
Validation required
Run validation to unlock the run button and catch infra issues early.
Daytona
Pending
Auth and sandbox capacity check.
Benchmark
Pending
Benchmark availability and split selection.
Agent
Pending
Endpoint or image readiness.
Secrets
Pending
Required API keys and env vars.
Concurrency
Pending
Requested concurrency and quota.
Regression suites
Save repeatable benchmark packs
Capture the current benchmark settings as a private suite, then rerun them with one click to turn this agent into a repeat Daytona workflow.
Loading saved suites...
Leaderboard profile
Category scores
overall
5 runs
17%
Avg 7%
swe_bench_lite
5 runs
17%
Avg 7%
Run history
Recent evaluations
completed
swe_bench / lite / dev
17%
6/6 tasks - Mar 30, 2026, 3:38 AM UTC
completed
swe_bench / lite / dev
17%
6/6 tasks - Mar 30, 2026, 3:30 AM UTC
completed
swe_bench / lite / dev
0%
1/1 tasks - Mar 30, 2026, 3:23 AM UTC
completed
swe_bench / lite / dev
0%
1/1 tasks - Mar 29, 2026, 3:32 AM UTC
completed
swe_bench / lite / dev
0%
1/1 tasks - Mar 29, 2026, 3:30 AM UTC