AI Models Compete in High School Math Exam: DouBao and YuanBao Triumph

With the annual college entrance exams looming, mathematics remains one of the most daunting subjects for students. But how would artificial intelligence fare under the same pressure? A recent competition put six leading AI models to the test using real exam questions from China's 2025 New Curriculum Standard I Volume.

The participants included DouBao (ByteDance), YuanBao (Tencent), Tongyi (Alibaba), WenXin X1Turbo (Baidu), DeepSeek (Shendu Qiusuo), and o3 (OpenAI). The exam consisted of 14 objective questions worth 73 total points, covering single-choice, multiple-choice, and fill-in-the-blank formats.

To ensure fairness, all models answered without system prompts or internet access—each had just one attempt. The results surprised many observers. DouBao and YuanBao emerged as joint champions with identical scores of 68 points, demonstrating remarkable problem-solving skills. DeepSeek followed closely with 63 points, while Tongyi scored 62. WenXin X1Turbo and o3 trailed significantly, with o3 managing only 34 points—less than half the top scorers' marks.

Breaking down the performance by question type reveals fascinating patterns:

In single-choice questions (35 points possible), DouBao, Tongyi and YuanBao achieved perfect scores
DeepSeek lost five points due to two errors
OpenAI's o3 struggled most severely, answering only half correctly
For multiple-choice questions, DouBao, DeepSeek and YuanBao demonstrated flawless accuracy
Tongyi showed speed but made critical judgment errors

The competition not only tested computational abilities but also highlighted how different AI systems approach complex reasoning tasks. While some models excelled at formula application and logical deduction, others faltered when facing China's unique exam structure—particularly o3's underwhelming performance suggests Western-developed AI may need localization adjustments.

Compared to previous years' benchmarks, the results show measurable progress in AI mathematical capabilities. Models now handle nuanced problems more effectively while still revealing room for improvement in consistency and contextual understanding.

What does this mean for education? As AI continues mastering academic challenges once thought uniquely human, schools must rethink how to assess true learning versus rote calculation. These digital contestants aren't just solving equations—they're reshaping our understanding of intelligence itself.

Key Points

Six major AI models competed using authentic Chinese high school math exam questions
ByteDance's DouBao and Tencent's YuanBao tied for first place with 68/73 points
OpenAI's o3 performed weakest at just 34 points—struggling with localized content
Multiple-choice questions proved easiest for top performers; single-choice revealed gaps
Results demonstrate significant year-over-year improvements in AI mathematical reasoning

AI DAMN

AI Models Compete in High School Math Exam: DouBao and YuanBao Triumph