Anthropic Claims 'Best Coding Model in the World' With Claude Sonnet 4.5

Anthropic has unveiled Claude Sonnet 4.5, which it claims is the best coding model globally, demonstrating a score of 77.2% on the SWE-bench Verified software engineering benchmark, rising to 82% with parallel computing. This model is touted to work autonomously for over 30 hours on complex tasks, showing improvements in reasoning and mathematical capabilities. It has outperformed both OpenAI’s and Google's top models, as well as Anthropic’s own Claude 4.1 Opus. The model also scored 61.4% on the OSWorld benchmark, improving from 42.2% just four months prior. Additionally, it enables developers with features like checkpoints and a refreshed terminal interface. While Anthropic emphasizes improved alignment and safety in the 4.5 version, it was swiftly jailbroken by a prompt engineer. The competitive landscape is intensifying for AI in coding capabilities with releases from both OpenAI and Google, marking a pivotal moment in the race for advanced AI functionality.

Source 🔗