ReaderLM v2: Advanced HTML to Markdown Conversion

Product Introduction
ReaderLM v2 is a cutting-edge small language model developed by Jina AI, designed for efficient conversion of HTML to Markdown and JSON formats. With its 1.5 billion parameters, it ensures high accuracy in data processing, making it an essential tool for developers, content creators, and researchers. The model supports up to 512,000 tokens, allowing for comprehensive text handling and data extraction from web pages.
Key Features
- HTML to Markdown Conversion: Converts HTML content while preserving complete information and utilizing Markdown syntax effectively.
- Direct HTML to JSON Generation: Extracts specific data from HTML based on a defined JSON schema, enhancing data cleaning and extraction efficiency.
- Multi-Language Support: Supports 29 languages, including English and Chinese, catering to a diverse user base.
- Long Text Handling: Capable of processing input/output combinations of up to 512,000 tokens, addressing challenges in long text degradation.
- Advanced Training Paradigm: Utilizes higher-quality training data for improved performance over its predecessor, enabling proficient Markdown syntax generation and complex element creation.
Product Data
- Model Parameters: 1.5 billion
- Token Limit: Up to 512,000 tokens for input/output combinations
- Supported Languages: 29 languages
- Primary Functions: HTML to Markdown and HTML to JSON conversion
Product Link
Product Website





