Skip to main content

GPT-4o Unveils Singing Feature in Major Voice Mode Upgrade

OpenAI has significantly upgraded GPT-4o's voice capabilities, introducing a singing function that pushes the boundaries of AI interaction. The advanced voice mode now processes audio directly rather than converting speech to text first, cutting response times to just 320 milliseconds - faster than human reaction speeds.

Image

Singing Breakthrough with Room for Improvement Users can now ask GPT-4o to sing songs through voice commands, including some copyrighted material. The AI generates melodies and lyrics on demand, though early tests reveal limitations with complex musical passages. "The performance isn't quite concert-ready," admits one tester, noting occasional stiffness in high notes.

Emotional Intelligence Upgrade Beyond singing, GPT-4o demonstrates remarkable emotional range. It can laugh, cry, and adopt specific character voices - imagine requesting a Shakespearean monologue or your favorite cartoon character's tone. This emotional flexibility opens doors for education and entertainment applications.

Technical Advancements The system's end-to-end audio processing represents a major technical leap. Traditional voice assistants like Siri use separate components for speech recognition and generation, creating noticeable delays. GPT-4o's unified approach enables more natural conversations where users can interrupt freely.

Copyright Challenges Emerge OpenAI has implemented safeguards against copyright infringement, but some users report successfully prompting copyrighted song performances. This gray area raises questions about AI's role in creative content generation and intellectual property protection.

Future Potential While the singing feature needs polish, its introduction signals OpenAI's commitment to multimodal AI development. The technology could revolutionize language learning through interactive singing exercises or create personalized audiobook narration with emotional depth.

Key Points

  1. GPT-4o's new singing function expands AI creative capabilities despite current quality limitations
  2. Direct audio processing reduces response times to 320ms for fluid conversations
  3. Advanced emotional expression enables laughter, crying and character voices
  4. Copyright concerns emerge as users bypass some content restrictions
  5. Technology shows promise for education and entertainment applications

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

Alibaba's Fun-CineForge Brings Hollywood-Style AI Dubbing to Open Source
News

Alibaba's Fun-CineForge Brings Hollywood-Style AI Dubbing to Open Source

Alibaba's Tongyi Lab has open-sourced Fun-CineForge, a groundbreaking AI system that solves film dubbing's toughest challenges. Unlike traditional robotic voiceovers, this multimodal model masters lip sync, emotional expression, and voice adaptation - even handling complex scenes with multiple speakers. The release includes both the AI model and CineDub, the first large-scale Chinese TV dubbing dataset. Early demos show startlingly natural results when redubbing classics like 'Romance of the Three Kingdoms.'

March 17, 2026
AI dubbingmultimodal AIvoice synthesis
Alibaba's New AI Brings Movie Characters to Life with Perfect Lip Sync
News

Alibaba's New AI Brings Movie Characters to Life with Perfect Lip Sync

Alibaba's Tongyi Lab has unveiled Fun-CineForge, an open-source voice synthesis model that solves Hollywood's toughest AI challenge - making digital voices match actors' lips perfectly. The breakthrough technology handles complex scenes with multiple characters, camera cuts, and obscured faces while maintaining emotional authenticity. Alongside the model, researchers released CineDub, an innovative dataset creation method that slashes production costs.

March 16, 2026
voice synthesisAI in entertainmentmultimodal AI
News

ChatGPT Gets a Video Upgrade: OpenAI Merges Sora to Boost Creativity

OpenAI is shaking things up by bringing its Sora video generator directly into ChatGPT. This bold move aims to supercharge the platform's creative tools while helping OpenAI reach its ambitious goal of 1 billion weekly users. But merging these powerful AI technologies won't come cheap - the company expects astronomical computing costs exceeding $225 billion through 2030.

March 11, 2026
OpenAIChatGPTAI video
News

ZTE's Nubia AI Phone Teams Up with Doubao for Seamless Voice Commands

ZTE unveiled its AI-powered Nubia M153 smartphone at MWC 2026, featuring deep integration with ByteDance's Doubao assistant. The phone can execute complex multi-app tasks through voice commands, like sending photos while booking flights. Alongside the phone, ZTE introduced iMoochi, an emotional companion robot that responds to touch and voice. With top-tier specs including Snapdragon 8 Elite processor and 6000mAh battery, Nubia M153 showcases ZTE's vision for AI-driven mobile experiences.

March 4, 2026
AI smartphonesZTEvoice assistants
Google's Flow Gets Major Upgrade with Nano Banana Model and Veo Integration
News

Google's Flow Gets Major Upgrade with Nano Banana Model and Veo Integration

Google has unveiled a significant update to its AI creative studio Flow, merging experimental projects Whisk and ImageFX into a unified platform. The highlight is the new Nano Banana image model that seamlessly connects to Veo video workflows. With enhanced editing tools and media management features, Google aims to streamline creative production while strengthening its competitive edge against rivals like OpenAI.

February 26, 2026
AI creativityGoogle updatesmultimodal AI
Kling AI 3.0 Unleashed: Bringing Cinematic Magic Within Reach
News

Kling AI 3.0 Unleashed: Bringing Cinematic Magic Within Reach

Kling AI's latest 3.0 version transforms video creation with smart storyboarding and extended clips up to 15 seconds. The update introduces film-grade lighting tech for stunning 4K images and simplifies multi-image style blending. Currently available for Black Gold members, these tools promise to democratize professional-quality storytelling.

February 5, 2026
AI video generationcreative toolsdigital storytelling