Skip to main content

Shanghai AI Lab Takes on OpenAI with LLaMA-O1: The Ultimate Math Showdown

Shanghai AI Lab Takes on OpenAI with LLaMA-O1: The Ultimate Math Showdown

The open-source community just got a seismic upgrade! hanghai AI Labhas unleashed their version of OpenAI's Olympiad-crushing tool, LLaMA-O1. That's right, they’ve recreated the o1 project — and it’s open-source, baby! This isn’t just another AI toy; we’re talking Monte Carlo Tree Search, Self-Play reinforcement learning, and a brainy dual-strategy architecture straight from AlphaGo Zero. The AI scene is buzzing, and for good reason.

image

The Genius Behind the Madness

Before OpenAI even put out their o1 series, Shanghai AI Lab was already knee-deep in Monte Carlo Tree Search, looking to boost the mathematical prowess of large models. But once o1 dropped, they cranked it up a notch, focusing their laser beams on one thing: mathematical Olympiad problems. Their mission? To build an open-source rival to OpenAI's Strawberry project.

The squad used a clever pairwise optimization strategy to level up the LLaMA model. Instead of just slapping scores on answers, they compared the relative merits of two answers. And guess what? It paid off, big time! In the killer AIME2024 benchmark test, their optimized model nailed 8 out of 30 questions — while the original LLaMA-3.1-8B-Instruct limped in with only 2. The only things that could beat it? OpenAI’s closed-source o1-preview and o1-mini. ove over, corporate overlords!

image

Cracking the Code: AlphaGo Zero Architecture

By the end of October, the Shanghai AI Lab team had some serious bragging rights. They managed to recreate OpenAI's o1, using the AlphaGo Zero playbook. Their model now flexes some next-level thinking, learning through interaction with a search tree — no need for manual labeling. And, in true open-source fashion, within a week, they flung the doors wide open for the world to see.

What’s inside this LLaMA-O1 treasure chest?

  • Pre-training datasets
  • Pre-trained models
  • Reinforcement learning training code The dataset, dubbed OpenLongCoT-Pretrain, is a beast. It’s packed with over 100,000 long thought chains. Each one breaks down a full mathematical problem-solving process, from reasoning content to scoring results, problem descriptions, graphics, calculations — the whole nine yards. After soaking up this data, the model spits out complex thought chains like it’s nothing.

image

The Magic Behind LLaMA-O1

Despite its name, LLaMA-O1’s pre-trained model actually rides on Google’s Gemma2. Developers can then fire up the reinforcement learning engine. Here’s how the magic happens:

  • Monte Carlo Tree Search handles self-play to generate experience.
  • Experiences get stored in a prioritized replay buffer.
  • Batches of data are pulled from the buffer for training.
  • Model parameters and experience priorities are updated constantly. The process is supercharged with cutting-edge tech like LoRA for fine-tuning, PPO for policy optimization, and prioritized experience replay to make sure every second counts during training.

Who’s Behind the Curtain?

Here’s where it gets mysterious: the code for LLaMA-O1 is dropped under a GitHub account named SimpleBerry. No flashy descriptions. No tell-all profiles. Just a name and some serious code. From what we can dig up, it seems SimpleBerry is some kind of research lab, but their exact focus? Total mystery.

The Competition: Enter O1-Journey

LLaMA-O1 isn’t the only game in town. Shanghai Jiao Tong University is also in the race with their O1-Journey project. In early October, they dropped their first progress report, showcasing the Journey Learning paradigm — a slick combo of search and learning for solving math problems. The team is packed with undergrads and PhD students from the GAIR Lab, and they’re being mentored by some heavy hitters like Associate Professor Liu Pengfei and Sloan Prize winner Li Yuanzhi.

Want the full nerdy details? Check out their papers here and here.


Summary

  1. Shanghai AI Lab has open-sourced LLaMA-O1, a recreated version of OpenAI’s Olympiad tool.
  2. The project uses Monte Carlo Tree Search, self-play reinforcement learning, and the AlphaGo Zero architecture.
  3. The team made huge strides in solving mathematical Olympiad problems, outperforming most commercial solutions.
  4. The LLaMA-O1 model is trained on Google’s Gemma2 with advanced reinforcement learning techniques.
  5. There’s a rival project, O1-Journey, from Shanghai Jiao Tong University, which is also making waves in AI-powered math reasoning.

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

News

Google's AI Turns News Reports into Flood Warnings for Vulnerable Regions

Google has developed an innovative flood prediction system by analyzing millions of news articles with its Gemini AI. The technology transforms qualitative reports into quantitative data, creating early warnings for areas lacking traditional weather monitoring. Already implemented in 150 countries, this approach marks a breakthrough in using language models for disaster prevention while addressing global inequality in weather forecasting capabilities.

March 13, 2026
AI innovationdisaster preventionclimate technology
Tencent's WorldCompass Helps AI Models Navigate Complex Commands
News

Tencent's WorldCompass Helps AI Models Navigate Complex Commands

Tencent has open-sourced WorldCompass, a reinforcement learning framework that dramatically improves how AI world models understand and execute complex instructions. This breakthrough solves persistent accuracy issues, boosting performance by over 35% in challenging scenarios. The technology marks a shift from pure pre-training to sophisticated fine-tuning approaches.

March 11, 2026
AI developmentTencentmachine learning
News

MiniMax Surpasses Baidu: China's AI Landscape Gets a Shake-Up

In a stunning market reversal, AI unicorn MiniMax has overtaken tech giant Baidu with a HK$382.6 billion valuation. The company's stock surged 22% amid strong financials showing 158.9% revenue growth, with 70% coming from international markets. This milestone signals shifting priorities in China's AI sector - from technical benchmarks to real-world profitability and global competitiveness.

March 11, 2026
AITechStocksMarketTrends
Xie Saining's Team Unveils Solaris: A Breakthrough in Multi-User Video AI
News

Xie Saining's Team Unveils Solaris: A Breakthrough in Multi-User Video AI

Xie Saining's research team has launched Solaris, the world's first multi-user video world model, powered by Kunlun Wanzhi's Matrix-Game2.0. This innovative technology enhances player interaction in environments like Minecraft, outperforming previous solutions. The release coincides with a major funding milestone for Xie's AI company, AMI, highlighting the growing importance of world models in advancing artificial general intelligence.

March 11, 2026
AIMachine LearningVirtual Worlds
ChatGPT Now Recognizes Songs Like Shazam - Here's How It Works
News

ChatGPT Now Recognizes Songs Like Shazam - Here's How It Works

OpenAI has teamed up with Shazam to bring music recognition directly into ChatGPT. No more switching apps when you hear that catchy tune - just ask ChatGPT what's playing and get instant results. The integration lets users identify songs through simple voice or text commands, complete with artist info and preview clips. It's like having a music-savvy friend in your chat.

March 10, 2026
OpenAIChatGPTShazam
News

AI Stuns Computing Legend: Claude Cracks Knuth's 30-Year Math Puzzle

In a remarkable display of artificial intelligence's growing capabilities, Claude Opus solved a complex graph theory problem that had eluded Donald Knuth for decades. The Turing Award winner described his astonishment as the AI not only found the solution but demonstrated creative problem-solving akin to human mathematicians. This breakthrough highlights AI's potential as a collaborative partner in advanced mathematical research.

March 9, 2026
Artificial IntelligenceMathematicsDonald Knuth