Claude Sonnet 4.5 Surpasses GPT-5 in Coding Benchmark
Anthropic Releases Claude Sonnet 4.5 with Superior Coding Capabilities
Anthropic has launched Claude Sonnet 4.5, its latest AI model that sets new standards in coding performance and autonomous task handling. Released on September 29, this update demonstrates significant improvements over previous versions and competing models like GPT-5.
Benchmark Performance Breakthrough
The model achieved leading results on the SWE-bench Verified coding benchmark, with autonomous operation lasting over 30 hours—a dramatic increase from Claude Opus4's previous 7-hour limit. Key improvements include:
- 0% error rate in code editing (down from 9%)
- 61.4% score on OSWorld benchmark (+19.2% from Sonnet4)
- Enhanced performance in finance, law, medicine, and STEM fields
Technical Advancements
Claude Sonnet 4.5 introduces several ecosystem improvements:
- Checkpoint feature: Allows saving progress during development
- Enhanced API tools: Supports longer sequence tasks through context editing
- Direct code execution: Integrated into Claude apps for workflow simplification
The model maintains competitive pricing at $3/million input tokens and $15/million output tokens, matching Sonnet4's rates.
Safety and Industry Impact
Anthropic describes this as their "most aligned cutting-edge model," with improved defenses against prompt injection and reduced risky behaviors. The release coincides with growing demand for AI agents, challenging competitors like GPT-5 and Gemini 2.5Pro.
The company also launched the Claude Agent SDK, enabling developers to create custom AI agents using natural language instructions.
Key Points:
- Outperforms GPT-5 in coding benchmarks
- Enables true "production-ready" application development
- Maintains competitive pricing structure
- Introduces innovative checkpoint feature
- Enhanced safety features for enterprise use