AI Trading Showdown: DeepSeek Outperforms Gemini in Market Test

AI Models Face Off in Real-Market Trading Challenge

Financial research lab nof1 has conducted a groundbreaking experiment called Alpha Arena, pitting six major AI models against each other in live trading scenarios on decentralized exchange Hyperliquid. Each model received $10,000 in real funds and operated under identical conditions to test their financial decision-making capabilities.

The Competitors and Results

The participating models included:

  • GPT-5
  • Gemini 2.5 Pro
  • Grok-4
  • Claude Sonet 4.5
  • DeepSeek V3.1
  • Qwen3Max

Image

The results revealed stark differences in performance:

  • DeepSeek V3.1 and Grok-4 tied for top position with returns exceeding 14%
  • Gemini 2.5 Pro suffered catastrophic losses of 42.57%, the worst performance recorded

The other models delivered mixed results, with none matching the top performers' success.

Beyond Simple Competition

The Alpha Arena project aims to evaluate more than just raw profitability. According to nof1 researchers, the primary objectives include:

  1. Assessing strategy stability under market volatility
  2. Testing risk response mechanisms across different model architectures
  3. Establishing benchmarks for AI-driven quantitative trading systems

The experiment demonstrates how large language models are evolving beyond text processing into complex financial applications.

Implications for Financial AI

The successful performance of certain models suggests promising applications for:

  • Automated portfolio management
  • Real-time trading algorithms
  • Risk assessment systems The dramatic failure of Gemini 2.5 Pro also underscores the importance of robust testing before deploying AI systems with real capital.

The financial sector continues to show strong interest in AI solutions that can process market data faster and more comprehensively than human traders.

Key Points:

  • DeepSeek V3.1 and Grok-4 achieved over 14% returns in live trading test
  • Gemini 2.5 Pro lost nearly half its allocated capital
  • Experiment conducted with $10,000 real funds per model on Hyperliquid exchange The study highlights both the potential and risks of AI-driven financial systems

Related Articles