Skip to main content

Doubao Unveils Advanced Visual Understanding Model

Doubao Unveils Advanced Visual Understanding Model

At the Volcano Engine FORCE Power Conference on December 18, 2024, Volcano Engine announced a comprehensive upgrade to the Doubao large model family, introducing a groundbreaking visual understanding model.

image

Tan Dai, the president of Volcano Engine, highlighted that the daily token usage of the Doubao large model has surged to over 4 trillion tokens, a remarkable 33-fold increase since its launch in May. This significant growth underscores the model's widespread adoption across various application scenarios.

image

The newly launched visual understanding model enables users to input both text and image questions simultaneously. This capability enhances the model's understanding and allows it to provide accurate responses, simplifying the application development process and unlocking the potential of large models in diverse scenarios.

The visual understanding model is equipped with advanced content recognition capabilities. It can identify basic elements such as object categories and shapes in images, understand relationships between objects, spatial layouts, and the overall meaning of scenes. For instance, it can recognize shadows and apply natural knowledge to interpret visual data effectively.

image

Additionally, the model exhibits stronger understanding and reasoning abilities, allowing for better content recognition and facilitating complex logical calculations based on identified text and image information. This includes chart reasoning and physical reasoning, enhancing its application in analytical tasks.

image

Furthermore, the visual understanding model features refined visual description capabilities, enabling it to generate detailed descriptions of content presented in images. This functionality can support various forms of creative writing, including image creation and image poetry.

image

The visual understanding model holds promising application prospects in numerous fields such as education, tourism, and e-commerce. In education, for example, the model can assist students in optimizing essays and enhancing their scientific knowledge. In tourism, it can provide translations of foreign menus and explanations of architectural sites for travelers. In the realm of e-commerce, it can help merchants highlight product features, thus improving advertising effectiveness.

The usage cost of the visual understanding model is notably affordable, priced at 0.003 yuan per thousand tokens, which is 85% lower than the industry average. This pricing allows the processing of up to 284 images at 720P for every yuan spent, marking a significant advancement in visual understanding technology. Additionally, Volcano Engine offers up to 15,000 initial traffic supports for enterprises and developers, facilitating better utilization of this innovative technology.

image

During the conference, Volcano Engine not only launched the visual understanding model but also upgraded several other models. The comprehensive task handling capability of the Doubao general model pro has improved by 32% since May, with notable enhancements in reasoning, instruction following, coding, and mathematics. Furthermore, the Doubao video generation model is set to be available for external service in January 2025, with enterprises encouraged to make reservations for its use.

image

To further enhance enterprises' information acquisition and search recommendation capabilities, Volcano Engine introduced a comprehensive AI search service. This service aims to help businesses connect information effectively with user needs, thus facilitating the intelligent transformation of various industries.

Key Points

  1. The daily token usage of the Doubao large model has reached 4 trillion, a 33-fold increase since May.
  2. The newly launched visual understanding model supports simultaneous input of text and images, applicable in education, tourism, and e-commerce.
  3. The usage cost is only 0.003 yuan per thousand tokens, significantly lower than the industry average.

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

Related Articles

News

Bumble's New AI Tools Help You Shine Online

Dating app Bumble rolled out smart new features this week to help users put their best foot forward. An AI profile coach offers personalized tips to polish your bio, while a photo advisor helps pick your most flattering shots. The moves aim to boost matches by reducing awkward first impressions—because let's face it, writing about yourself is hard. While competitors race to add similar tech, privacy concerns linger as apps dig deeper into our personal data.

February 27, 2026
dating appsAI technologyonline privacy
Anthropic Bolsters AI Ambitions with Vercept Acquisition
News

Anthropic Bolsters AI Ambitions with Vercept Acquisition

AI powerhouse Anthropic has snapped up Seattle-based startup Vercept in a strategic move to strengthen its Claude Code ecosystem. While some founders transition to Anthropic, others voice disappointment over the product shutdown. The deal highlights the fierce competition for top AI talent as major players race to dominate emerging technologies.

February 26, 2026
AnthropicAI acquisitionsdeveloper tools
News

Wayve Drives Off with $1 Billion for AI-Powered Autonomous Cars

London-based AI startup Wayve just secured a massive $1.05 billion investment, led by SoftBank with backing from NVIDIA and Microsoft. The company's unique approach to self-driving technology - which mimics human learning rather than relying on expensive sensors - could revolutionize how cars navigate city streets. This funding marks a major vote of confidence in European AI innovation and signals growing excitement about 'embodied AI' applications.

February 25, 2026
autonomous vehiclesAI startupsSoftBank
News

Spotify's New AI Feature Turns Your Mood Into Music

Spotify is revolutionizing how we discover music with its new AI Playlist feature. Premium subscribers in select countries can now create personalized playlists simply by describing their mood or activity - no more endless scrolling. The tool understands complex requests like 'retro jogging tracks with an 80s neon vibe' and continuously improves results based on feedback. This innovation comes as Spotify increasingly bets on AI to stay ahead in the competitive streaming market.

February 24, 2026
Spotifymusic streamingAI technology
China's GLM-5 AI Model Breaks New Ground with Domestic Chip Support
News

China's GLM-5 AI Model Breaks New Ground with Domestic Chip Support

Zhipu Technology's GLM-5 AI model has made waves with its latest upgrades, now fully supporting seven major Chinese chip platforms. The model boasts a staggering 744 billion parameters and leads globally in programming agent capabilities. While user demand temporarily overwhelmed servers, the company has responded with compensation measures. Key innovations include a dynamic attention mechanism and new reinforcement learning algorithms that significantly boost performance.

February 23, 2026
AI innovationChinese techmachine learning
AI Lights Up Spring Festival Gala with Record-Breaking 1.9 Billion Interactions
News

AI Lights Up Spring Festival Gala with Record-Breaking 1.9 Billion Interactions

The 2026 Spring Festival Gala made history by integrating AI technology like never before. Doubao's AI-powered features enabled viewers to generate over 50 million festive profile pictures and 100 million digital greetings, while backstage, the Seedance 2.0 model transformed stage visuals with breathtaking precision. Behind the scenes, ByteDance's computing infrastructure handled an unprecedented 63.3 billion tokens per minute at peak moments.

February 17, 2026
AI innovationSpring Festival GalaDoubao