Skip to main content

Microsoft Azure ND GB300 Breaks AI Inference Record

Microsoft Azure ND GB300 Sets New AI Inference Benchmark

Microsoft has announced a groundbreaking achievement in artificial intelligence performance with its Azure ND GB300v6 virtual machine. The system has set a new industry record by processing 1.1 million tokens per second during inference operations on Meta's Llama270B model.

Image

Unprecedented Hardware Configuration

The record-breaking performance comes from Microsoft's collaboration with NVIDIA, utilizing the cutting-edge NVIDIA Blackwell Ultra GPU architecture. Each Azure ND GB300 virtual machine features:

  • 72 NVIDIA Blackwell Ultra GPUs
  • 36 NVIDIA Grace CPUs
  • Single-machine architecture design optimized for inference workloads

The system boasts significant improvements over previous generations, including:

  • 50% increase in GPU memory
  • 16% increase in thermal design power (TDP)

Performance Validation and Results

Microsoft conducted rigorous testing to verify the system's capabilities:

  • Ran Llama270B model at FP4 precision
  • Utilized 18 ND GB300v6 virtual machines within an NVIDIA GB300NVL72 domain
  • Employed NVIDIA TensorRT-LLM as the inference engine

The tests demonstrated remarkable results:

  • Each GPU processed approximately 15,200 tokens per second
  • Total system performance reached the unprecedented 1.1 million tokens per second mark
  • Performance represents a 27% improvement over previous NVIDIA GB200 systems

The results have been independently verified by Signal65, a respected performance benchmarking company.

Industry Implications and Expert Commentary

Russ Feroes, Vice President of Laboratories at Signal65, highlighted the significance of this achievement:

"This milestone not only broke through the barrier of one million tokens per second but also achieved it on a platform that meets the dynamic usage and data governance needs of modern enterprises."

The new system shows exceptional efficiency gains:

  • Nearly 10x improvement in inference performance compared to NVIDIA H100 systems
  • 2.5x better rack-level power efficiency than previous generations
  • Only 17% increase in power specifications despite significant performance gains

The breakthrough demonstrates Microsoft's continued leadership in enterprise-scale AI solutions.

Key Points:

🚀 Achieved industry-record 1.1 million tokens/second inference speed 💻 Powered by 72 Blackwell Ultra GPUs + 36 Grace CPUs 📈 Delivers 27% better performance than previous generation ⚡ Offers nearly 10x improvement over H100 systems 🌱 Maintains enterprise-grade data governance and dynamic usage capabilities

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

Microsoft Launches MAI-Image-1 Text-to-Image AI Model
News

Microsoft Launches MAI-Image-1 Text-to-Image AI Model

Microsoft has unveiled MAI-Image-1, its first in-house text-to-image AI model, marking a significant expansion of its autonomous AI capabilities. The model focuses on photo-realistic outputs and faster processing, ranking among the top performers in AI benchmarks. This release follows Microsoft's growing investments in proprietary AI technologies.

October 14, 2025
MicrosoftAIGenerativeAITextToImage
Microsoft Unveils Copilot Audio Mode for Custom Voice Interactions
News

Microsoft Unveils Copilot Audio Mode for Custom Voice Interactions

Microsoft has introduced a new Copilot Audio mode featuring three distinct voice styles—Emotional, Story, and Script—powered by its MAI-Voice-1 model. The update offers diverse vocal options for scenarios ranging from storytelling to precise information delivery, marking Microsoft's push for AI independence alongside its MAI-1 model launch.

September 11, 2025
CopilotVoiceAIMicrosoftAI
FriendliAI Raises $20M to Boost AI Inference Efficiency
News

FriendliAI Raises $20M to Boost AI Inference Efficiency

FriendliAI, an AI inference optimization startup, has secured $20 million in seed extension funding. The company plans to enhance its platform to reduce AI model deployment costs and improve efficiency, with projected 2025 revenue growth of 6-7x compared to 2024.

September 3, 2025
AIInferenceStartupFundingMachineLearning
Microsoft Open-Sources VibeVoice TTS Model with Breakthrough Features
News

Microsoft Open-Sources VibeVoice TTS Model with Breakthrough Features

Microsoft has open-sourced its advanced VibeVoice text-to-speech model, featuring 90-minute speech generation, 4-person dialogue support, and exceptional Chinese language performance. The model's capabilities in long-form content creation and multi-speaker scenarios position it as a significant advancement in AI voice technology.

August 26, 2025
TextToSpeechMicrosoftAIVoiceSynthesis
News

Encyclopedia Britannica Takes OpenAI to Court Over Alleged Content Theft

The venerable Encyclopedia Britannica has filed a lawsuit against OpenAI, accusing the AI giant of using its copyrighted content without permission to train ChatGPT. The complaint alleges nearly 100,000 instances of unauthorized copying, with AI responses sometimes mirroring Britannica entries word-for-word. This legal battle could reshape how AI companies use copyrighted materials in their training datasets.

March 20, 2026
AI copyrightOpenAI lawsuitEncyclopedia Britannica
Tencent's QClaw AI Assistant Now Open for Public Testing – Control Your PC from WeChat
News

Tencent's QClaw AI Assistant Now Open for Public Testing – Control Your PC from WeChat

Tencent has opened public beta testing for its QClaw AI assistant, removing the need for invitation codes. This quirky 'digital lobster' lets users remotely control their computers through WeChat, handling tasks from document editing to scheduling meetings. Developed by Tencent Computer Manager, it offers WeChat/QQ integration, personalized learning, and a library of 5,000+ skills. Installation takes just minutes, making advanced AI assistance accessible to everyone.

March 20, 2026
TencentAI AssistantProductivity Tools