Meta Unveils OMol25 Dataset and UMA Model for AI-Driven Chemistry
Meta has taken a significant leap in AI-powered chemistry with the release of OMol25, its largest open dataset for molecular research, and the Universal Atom Model (UMA), a groundbreaking AI tool for predicting chemical properties. These developments promise to revolutionize fields from pharmaceuticals to renewable energy.
The OMol25 Dataset: A Molecular Treasure Trove
Containing over 100 million high-precision molecular calculations, OMol25 dwarfs existing public datasets. Meta invested over 6 billion computational hours to create this resource, which spans:
- Small organic compounds
- Biomolecules (proteins, DNA fragments)
- Metal complexes and electrolytes
The dataset provides unprecedented detail including energy values, force measurements, charge distributions, and orbital data. Researchers can access OMol25 through the Hugging Face platform.

UMA: The Atomic-Level Predictor
The companion Universal Atom Model represents a paradigm shift in computational chemistry. Unlike traditional methods that require specialized models for each task, UMA offers:
- Atomic-level property prediction
- 1000x faster calculations than conventional methods
- Generalizability across drug discovery and materials science
Built on advanced graph neural networks with a "mixed linear expert" architecture, UMA matches the accuracy of specialized models while maintaining computational efficiency. Meta reports that tasks requiring days can now complete in seconds.
Accelerating Discovery
This technology enables researchers to:
- Rapidly screen thousands of molecular candidates
- Evaluate drug or battery material potential before synthesis
- Explore novel chemical spaces with "accompanying sampling" - a new AI technique that generates viable molecular structures without real-world samples
The accompanying sampling method draws from stochastic control theory, proving particularly effective for molecules with flexible components. All models and code are available on Hugging Face and GitHub.
Current Limitations and Future Directions
While transformative, the system has some constraints:
- Limited coverage of polymers and certain metal compounds
- Room for improvement in predicting charges and long-range interactions These gaps present opportunities for future research collaborations.
Key Points
- OMol25 contains 100M+ molecular data points - the largest public chemistry dataset
- UMA predicts atomic properties 1000x faster than traditional methods
- "Accompanying sampling" enables structure generation without real samples
- Applications span drug discovery, battery tech, and catalyst development



