Doubao Unveils Advanced Visual Understanding Model
date
Dec 19, 2024
damn
language
en
status
Published
type
News
image
https://www.ai-damn.com/1734566789309-6387012948949028804791943.png
slug
doubao-unveils-advanced-visual-understanding-model-1734566806662
tags
Doubao
Visual Understanding
AI Technology
Volcano Engine
Machine Learning
summary
At the Volcano Engine FORCE Power Conference, Doubao introduced a significant upgrade to its large model family, featuring a new visual understanding model that allows simultaneous text and image queries. This innovation is set to enhance applications across various sectors, including education and e-commerce, while offering cost-effective usage for developers.
Doubao Unveils Advanced Visual Understanding Model
At the Volcano Engine FORCE Power Conference on December 18, 2024, Volcano Engine announced a comprehensive upgrade to the Doubao large model family, introducing a groundbreaking visual understanding model.
Tan Dai, the president of Volcano Engine, highlighted that the daily token usage of the Doubao large model has surged to over 4 trillion tokens, a remarkable 33-fold increase since its launch in May. This significant growth underscores the model's widespread adoption across various application scenarios.
The newly launched visual understanding model enables users to input both text and image questions simultaneously. This capability enhances the model's understanding and allows it to provide accurate responses, simplifying the application development process and unlocking the potential of large models in diverse scenarios.
The visual understanding model is equipped with advanced content recognition capabilities. It can identify basic elements such as object categories and shapes in images, understand relationships between objects, spatial layouts, and the overall meaning of scenes. For instance, it can recognize shadows and apply natural knowledge to interpret visual data effectively.
Additionally, the model exhibits stronger understanding and reasoning abilities, allowing for better content recognition and facilitating complex logical calculations based on identified text and image information. This includes chart reasoning and physical reasoning, enhancing its application in analytical tasks.
Furthermore, the visual understanding model features refined visual description capabilities, enabling it to generate detailed descriptions of content presented in images. This functionality can support various forms of creative writing, including image creation and image poetry.
The visual understanding model holds promising application prospects in numerous fields such as education, tourism, and e-commerce. In education, for example, the model can assist students in optimizing essays and enhancing their scientific knowledge. In tourism, it can provide translations of foreign menus and explanations of architectural sites for travelers. In the realm of e-commerce, it can help merchants highlight product features, thus improving advertising effectiveness.
The usage cost of the visual understanding model is notably affordable, priced at 0.003 yuan per thousand tokens, which is 85% lower than the industry average. This pricing allows the processing of up to 284 images at 720P for every yuan spent, marking a significant advancement in visual understanding technology. Additionally, Volcano Engine offers up to 15,000 initial traffic supports for enterprises and developers, facilitating better utilization of this innovative technology.
During the conference, Volcano Engine not only launched the visual understanding model but also upgraded several other models. The comprehensive task handling capability of the Doubao general model pro has improved by 32% since May, with notable enhancements in reasoning, instruction following, coding, and mathematics. Furthermore, the Doubao video generation model is set to be available for external service in January 2025, with enterprises encouraged to make reservations for its use.
To further enhance enterprises' information acquisition and search recommendation capabilities, Volcano Engine introduced a comprehensive AI search service. This service aims to help businesses connect information effectively with user needs, thus facilitating the intelligent transformation of various industries.
Key Points
- The daily token usage of the Doubao large model has reached 4 trillion, a 33-fold increase since May.
- The newly launched visual understanding model supports simultaneous input of text and images, applicable in education, tourism, and e-commerce.
- The usage cost is only 0.003 yuan per thousand tokens, significantly lower than the industry average.