Shanghai AI Lab Launches First Video-to-Web BenchmarkWelcome to AI DAMN! Discover the most amazing latest AI news, innovative AI products, and groundbreaking AI projects. From ChatGPT to cutting-edge models, we curate the AI developments that make you go 'DAMN!' - your daily dose of mind-blowing artificial intelligence.

Discover

Language

Account

Shanghai AI Lab Launches First Video-to-Web Benchmark

Shanghai AI Lab Unveils Groundbreaking Video-to-Web Benchmark

The Shanghai Artificial Intelligence Laboratory has launched IWR-Bench, the world's first evaluation framework designed to assess how well large language models can transform video demonstrations into functional web code. This innovative benchmark addresses a critical gap in assessing multimodal AI systems' capabilities for dynamic web reconstruction.

Breaking New Ground in AI Evaluation

Unlike traditional image-to-code tasks, IWR-Bench presents models with videos capturing complete user interactions alongside all necessary static webpage resources. The system then evaluates how accurately models can recreate the observed dynamic behaviors across various complexity levels - from basic web browsing to sophisticated applications like the 2048 game and flight booking systems.

Surprising Performance Gaps Revealed

Initial testing of 28 leading AI models yielded sobering results:

GPT-5 emerged as top performer with just 36.35/100 overall score
Interaction Function Score (IFS): 24.39%
Visual Fidelity Score (VFS): 64.25%

The significant disparity between visual restoration (64.25%) and functional accuracy (24.39%) highlights fundamental challenges in translating observed behaviors into working code logic.

Innovative Evaluation Methodology

The benchmark employs several novel assessment techniques:

Proxy-based automated testing verifies interactive functionality
Complete but anonymized static resources force visual matching rather than semantic shortcuts
Temporal understanding tests track state changes across video frames
Multi-dimensional scoring evaluates both appearance and functionality

Technical Challenges Identified

The research uncovered four major hurdles for current AI systems:

Temporal understanding: Extracting key events from continuous video frames
Logical abstraction: Converting behaviors into programming concepts like event listeners
Resource matching: Correctly associating anonymized files with visual elements
Code generation: Producing structurally sound HTML/CSS/JavaScript

The findings suggest that even advanced multimodal models struggle with causal reasoning and state management required for dynamic web reconstruction.

Industry Implications

The benchmark's creators emphasize its dual significance:

Research value: Provides new metrics for evaluating dynamic understanding capabilities
Practical potential: Could eventually lower barriers to front-end development if technology matures However, researchers caution that high benchmark scores wouldn't immediately translate to production-ready tools, noting critical gaps in handling performance optimization, security, and edge cases.

Key Points:

First specialized benchmark for video-to-webpage conversion unveiled
GPT-5 leads but scores just 36.35/100 overall
Models show strong visual restoration (64%) but weak interaction logic (24%)
Reveals fundamental gaps in temporal reasoning and state management
Could shape future "what you see is what you get" development tools

Enjoyed this article?

Subscribe to our newsletter for the latest AI news, product reviews, and project recommendations delivered to your inbox weekly.

Weekly digestFree foreverUnsubscribe anytime

News

Baidu's ERNIE 5.0 Breaks New Ground with Massive AI Upgrade

Baidu has unveiled ERNIE 5.0, its most advanced AI model yet featuring a staggering 2.4 trillion parameters. This multimodal powerhouse can process text, images, audio and video simultaneously, outperforming competitors in over 40 benchmark tests. With input from hundreds of experts across various fields, ERNIE 5.0 promises smarter responses and faster processing for both individual users and businesses.

January 22, 2026

Artificial IntelligenceBaiduMultimodal AI

News

Gemini Leads Global AI Vision Race While Chinese Models Gain Ground

Google's Gemini-3-pro dominates the latest multimodal vision benchmark with an impressive 83.64 score, while Chinese contenders SenseTime and ByteDance show remarkable progress. The evaluation reveals shifting power dynamics in AI's visual understanding capabilities, with surprises including Qwen3-vl becoming the first open-source model to break 70 points and GPT-5.2 unexpectedly lagging behind.

December 31, 2025

AI benchmarkscomputer visionmultimodal AI

News

Gemini-3-Pro Leads Multimodal AI Race as Chinese Models Gain Ground

Google's Gemini-3-Pro dominates the latest multimodal AI rankings with an impressive 83.64 score, while Chinese models from ByteDance and SenseTime show strong progress. The evaluation reveals surprising gaps between tech giants, with OpenAI's GPT-5.2 unexpectedly trailing behind. Notably, Alibaba's Qwen3-VL becomes the first open-source model to break the 70-point barrier.

December 31, 2025

AI RankingsMultimodal AIComputer Vision

News

Google's FACTS Benchmark Reveals AI Models Struggle with Accuracy

Google's FACTS team and Kaggle have introduced a new benchmark suite to evaluate AI models' factual accuracy. Initial tests show even top models like Gemini 3 Pro and GPT-5 can't surpass 70% accuracy, highlighting significant challenges in fields requiring precision like law and healthcare. The benchmark includes four real-world scenario tests, with multimodal tasks proving particularly difficult for current AI systems.

December 12, 2025

AI benchmarksGoogle researchmachine learning

News

Alibaba Cloud's XiYan-SQL Takes Top Spot in Global Database Benchmark

Alibaba Cloud's XiYan-SQL has outperformed competitors in the rigorous BIRD-CRITIC evaluation, setting new standards for SQL diagnosis and repair. The benchmark tests real-world database problem-solving across multiple platforms, with XiYan-SQL excelling in complex scenarios and cross-dialect adaptability. Its success stems from innovative approaches to schema filtering and SQL generation.

December 5, 2025

database technologyAI benchmarkscloud computing

News

Kling AI 2.6 Debuts with Game-Changing Audio Features

Kuaishou's Kling AI has unveiled version 2.6, marking a significant leap forward in AI-generated content. The update introduces native audio capabilities alongside its existing video tools, creating seamless multimodal experiences. With improved efficiency and quality metrics, this release promises to transform creative workflows for professionals across media industries.

December 3, 2025

AI Video GenerationMultimodal AICreative Technology

Shanghai AI Lab Launches First Video-to-Web Benchmark

Shanghai AI Lab Unveils Groundbreaking Video-to-Web Benchmark

Breaking New Ground in AI Evaluation

Surprising Performance Gaps Revealed

Innovative Evaluation Methodology

Technical Challenges Identified

Industry Implications

Key Points:

Enjoyed this article?

Related Articles

Baidu's ERNIE 5.0 Breaks New Ground with Massive AI Upgrade

Gemini Leads Global AI Vision Race While Chinese Models Gain Ground

Gemini-3-Pro Leads Multimodal AI Race as Chinese Models Gain Ground

Google's FACTS Benchmark Reveals AI Models Struggle with Accuracy

Alibaba Cloud's XiYan-SQL Takes Top Spot in Global Database Benchmark

Kling AI 2.6 Debuts with Game-Changing Audio Features

Popular Articles

TSMC Reports Record Revenue, AI Growth Fuels Optimism for 2025

Nano Banana: AI Image Editor

Claude AI Assistant Launches on Slack to Boost Team Productivity

Baidu Unveils 2024 AI Keyword: 'Answer'

Wittro: Undetectable AI Assistant for Interviews & Meetings

Main Pages

Content

Others