AI D​A​M​N/Claude Sonnet 4.5 Surpasses GPT-5 in Coding Benchmark

Claude Sonnet 4.5 Surpasses GPT-5 in Coding Benchmark

Anthropic Releases Claude Sonnet 4.5 with Superior Coding Capabilities

Anthropic has launched Claude Sonnet 4.5, its latest AI model that sets new standards in coding performance and autonomous task handling. Released on September 29, this update demonstrates significant improvements over previous versions and competing models like GPT-5.

Image

Benchmark Performance Breakthrough

The model achieved leading results on the SWE-bench Verified coding benchmark, with autonomous operation lasting over 30 hours—a dramatic increase from Claude Opus4's previous 7-hour limit. Key improvements include:

  • 0% error rate in code editing (down from 9%)
  • 61.4% score on OSWorld benchmark (+19.2% from Sonnet4)
  • Enhanced performance in finance, law, medicine, and STEM fields

Technical Advancements

Claude Sonnet 4.5 introduces several ecosystem improvements:

  1. Checkpoint feature: Allows saving progress during development
  2. Enhanced API tools: Supports longer sequence tasks through context editing
  3. Direct code execution: Integrated into Claude apps for workflow simplification

The model maintains competitive pricing at $3/million input tokens and $15/million output tokens, matching Sonnet4's rates.

Image

Safety and Industry Impact

Anthropic describes this as their "most aligned cutting-edge model," with improved defenses against prompt injection and reduced risky behaviors. The release coincides with growing demand for AI agents, challenging competitors like GPT-5 and Gemini 2.5Pro.

The company also launched the Claude Agent SDK, enabling developers to create custom AI agents using natural language instructions.

Key Points:

  • Outperforms GPT-5 in coding benchmarks
  • Enables true "production-ready" application development
  • Maintains competitive pricing structure
  • Introduces innovative checkpoint feature
  • Enhanced safety features for enterprise use