AI DAMN - Mind-blowing AI News & Innovations/AI Models Compete in High School Math Exam: DouBao and YuanBao Triumph

AI Models Compete in High School Math Exam: DouBao and YuanBao Triumph

With the annual college entrance exams looming, mathematics remains one of the most daunting subjects for students. But how would artificial intelligence fare under the same pressure? A recent competition put six leading AI models to the test using real exam questions from China's 2025 New Curriculum Standard I Volume.

The participants included DouBao (ByteDance), YuanBao (Tencent), Tongyi (Alibaba), WenXin X1Turbo (Baidu), DeepSeek (Shendu Qiusuo), and o3 (OpenAI). The exam consisted of 14 objective questions worth 73 total points, covering single-choice, multiple-choice, and fill-in-the-blank formats.

Image

To ensure fairness, all models answered without system prompts or internet access—each had just one attempt. The results surprised many observers. DouBao and YuanBao emerged as joint champions with identical scores of 68 points, demonstrating remarkable problem-solving skills. DeepSeek followed closely with 63 points, while Tongyi scored 62. WenXin X1Turbo and o3 trailed significantly, with o3 managing only 34 points—less than half the top scorers' marks.

Image

Breaking down the performance by question type reveals fascinating patterns:

  • In single-choice questions (35 points possible), DouBao, Tongyi and YuanBao achieved perfect scores
  • DeepSeek lost five points due to two errors
  • OpenAI's o3 struggled most severely, answering only half correctly
  • For multiple-choice questions, DouBao, DeepSeek and YuanBao demonstrated flawless accuracy
  • Tongyi showed speed but made critical judgment errors

The competition not only tested computational abilities but also highlighted how different AI systems approach complex reasoning tasks. While some models excelled at formula application and logical deduction, others faltered when facing China's unique exam structure—particularly o3's underwhelming performance suggests Western-developed AI may need localization adjustments.

Compared to previous years' benchmarks, the results show measurable progress in AI mathematical capabilities. Models now handle nuanced problems more effectively while still revealing room for improvement in consistency and contextual understanding.

What does this mean for education? As AI continues mastering academic challenges once thought uniquely human, schools must rethink how to assess true learning versus rote calculation. These digital contestants aren't just solving equations—they're reshaping our understanding of intelligence itself.

Key Points

  1. Six major AI models competed using authentic Chinese high school math exam questions
  2. ByteDance's DouBao and Tencent's YuanBao tied for first place with 68/73 points
  3. OpenAI's o3 performed weakest at just 34 points—struggling with localized content
  4. Multiple-choice questions proved easiest for top performers; single-choice revealed gaps
  5. Results demonstrate significant year-over-year improvements in AI mathematical reasoning

© 2024 - 2025 Summer Origin Tech

Powered by Summer Origin Tech