AI DAMN/BytePush Launches Revolutionary 1.58-bit FLUX Model

BytePush Launches Revolutionary 1.58-bit FLUX Model

BytePush Launches Revolutionary 1.58-bit FLUX Model

Artificial Intelligence (AI) driven text-to-image (T2I) generation models have garnered attention for their exceptional capabilities. Notable examples include DALLE3 and Adobe Firefly3, which exhibit significant potential in various real-world applications. However, these models typically consist of billions of parameters and require substantial memory, making deployment on resource-constrained platforms, such as mobile devices, a significant challenge.

To tackle these issues, researchers from ByteDance and POSTECH have explored advanced techniques for extremely low-bit quantization of T2I models. Among various models, FLUX.1-dev was chosen for its public availability and outstanding performance. The researchers employed a 1.58-bit quantization method, which compresses the visual transformer weights in the FLUX model to just three values: {-1, 0, +1}. This innovative approach relies solely on the self-supervision of the FLUX.1-dev model and does not require access to image data, distinguishing it from the previously developed BitNet b1.58 method that entails training a large language model from scratch.

image

By implementing this quantization method, the model's storage requirements were drastically reduced by 7.7 times. The 1.58-bit weights, stored as 2-bit signed integers, achieved a compression rate from the conventional 16-bit precision. To enhance inference efficiency, the researchers also developed a custom kernel optimized for low-bit computation, resulting in over 5.1 times reduction in inference memory usage and improved latency.

Benchmark evaluations, including GenEval and T2I Compbench, revealed that the 1.58-bit FLUX model significantly enhances computational efficiency while maintaining generation quality comparable to the full-precision FLUX model. In fact, the researchers successfully quantized 99.5% of the visual transformer parameters (totaling 11.9 billion parameters) to 1.58 bits, considerably lowering storage requirements. Experimental results demonstrated that 1.58-bit FLUX performed comparably to the original FLUX model across the T2I CompBench and GenEval datasets. Notably, 1.58-bit FLUX exhibited remarkable improvements in inference speed on lower-performance GPUs, such as L20 and A10.

image

In conclusion, the introduction of the 1.58-bit FLUX model represents a significant advancement towards enabling high-quality T2I models for practical deployment on devices with limited memory and latency. Although the model still faces some limitations regarding speed improvements and high-resolution image detail rendering, its potential to enhance model efficiency and reduce resource consumption is expected to provide valuable insights for future research in the field.

Key Improvements

  1. Model Compression: Storage space reduced by 7.7 times.
  2. Memory Optimization: Inference memory usage decreased by over 5.1 times.
  3. Performance Retention: Maintained performance comparable to the full-precision FLUX model in benchmarks.
  4. No Image Data Required: Quantization does not rely on image data, utilizing the model's self-supervision.
  5. Custom Kernel: Optimized kernel for low-bit computation enhances inference efficiency. For more details, visit the Project Page, Paper Link, and Model Link.

© 2024 - 2025 Summer Origin Tech

Powered by Nobelium