Apple recently made waves in the AI world with the release of their new open-source DCLM models on Hugging Face. These models, developed as part of the DataComp for Language Models project, have been described as the best-performing open-source models currently available. Led by a team of researchers from Apple, University of Washington, Tel Aviv University, and Toyota Institute of Research, the project aims to design high-quality datasets for training AI models in the multimodal domain.
The key to the success of these models lies in the data curation process. By using a standardized framework with fixed model architectures, training code, hyperparameters, and evaluations, the researchers were able to train highly performant models. The resulting dataset, DCLM-Baseline, was used to train two new DCLM decoder-only transformer English language models with 7 billion and 1.4 billion parameters from scratch.
The larger 7 billion parameter model, trained on 2.5 trillion tokens, has shown impressive performance across various benchmarks. It boasts a 63.7% 5-shot accuracy on MMLU, outperforming previous state-of-the-art models while using 40% less compute for training. The smaller 1.4 billion parameter model, trained jointly with Toyota Research Institute on 2.6 trillion tokens, also delivers strong performance, scoring 41.9% in the 5-shot MMLU test.
Apple has released the larger model under their Sample Code License, while the smaller model is available under the Apache 2.0 license, allowing for commercial use, distribution, and modification. Additionally, an instruction-tuned version of the larger model is available in the Hugging Face library.
While these models represent a significant advancement in AI research, it’s important to note that they are not intended for use on Apple devices and may exhibit biases or produce harmful responses. This early research highlights the importance of data curation in training language models and provides a foundation for further research in the field. As the AI landscape continues to evolve, Apple’s DCLM models showcase the company’s commitment to innovation and excellence in the field of artificial intelligence.