Microsoft Azure ND GB300 Breaks AI Inference Record
Microsoft Azure ND GB300 Sets New AI Inference Benchmark
Microsoft has announced a groundbreaking achievement in artificial intelligence performance with its Azure ND GB300v6 virtual machine. The system has set a new industry record by processing 1.1 million tokens per second during inference operations on Meta's Llama270B model.

Unprecedented Hardware Configuration
The record-breaking performance comes from Microsoft's collaboration with NVIDIA, utilizing the cutting-edge NVIDIA Blackwell Ultra GPU architecture. Each Azure ND GB300 virtual machine features:
- 72 NVIDIA Blackwell Ultra GPUs
- 36 NVIDIA Grace CPUs
- Single-machine architecture design optimized for inference workloads
The system boasts significant improvements over previous generations, including:
- 50% increase in GPU memory
- 16% increase in thermal design power (TDP)
Performance Validation and Results
Microsoft conducted rigorous testing to verify the system's capabilities:
- Ran Llama270B model at FP4 precision
- Utilized 18 ND GB300v6 virtual machines within an NVIDIA GB300NVL72 domain
- Employed NVIDIA TensorRT-LLM as the inference engine
The tests demonstrated remarkable results:
- Each GPU processed approximately 15,200 tokens per second
- Total system performance reached the unprecedented 1.1 million tokens per second mark
- Performance represents a 27% improvement over previous NVIDIA GB200 systems
The results have been independently verified by Signal65, a respected performance benchmarking company.
Industry Implications and Expert Commentary
Russ Feroes, Vice President of Laboratories at Signal65, highlighted the significance of this achievement:
"This milestone not only broke through the barrier of one million tokens per second but also achieved it on a platform that meets the dynamic usage and data governance needs of modern enterprises."
The new system shows exceptional efficiency gains:
- Nearly 10x improvement in inference performance compared to NVIDIA H100 systems
- 2.5x better rack-level power efficiency than previous generations
- Only 17% increase in power specifications despite significant performance gains
The breakthrough demonstrates Microsoft's continued leadership in enterprise-scale AI solutions.
Key Points:
🚀 Achieved industry-record 1.1 million tokens/second inference speed 💻 Powered by 72 Blackwell Ultra GPUs + 36 Grace CPUs 📈 Delivers 27% better performance than previous generation ⚡ Offers nearly 10x improvement over H100 systems 🌱 Maintains enterprise-grade data governance and dynamic usage capabilities




