US export controls on advanced semiconductors were intended to slow China’s progress on artificial intelligence, but they may have inadvertently spurred innovation. Unable to rely solely on the latest hardware, companies like Hangzhou-based DeepSeek have been forced to find creative solutions to do more with less.
Moreover, China is pursuing an open source strategy and is emerging as one of the largest providers of powerful, fully open AI models in the world.
This month, DeepSeek released its R1 model, using advanced techniques such as pure reinforcement learning to create a model that is not only among the most fearsome in the world, but is fully open source, making it available for anyone in the world to review, modify, and build upon.
DeepSeek-R1 shows that China is not out of the AI race and, in fact, may yet dominate global AI development with its surprising open source strategy. By using competitive open source models, Chinese companies can increase their global influence and potentially shape international AI standards and practices. Open source projects also attract global talent and resources to contribute to Chinese AI development. The strategy further enables China to extend its technological reach to developing countries, potentially incorporating its artificial intelligence systems – and by extension, its values and norms – into the global digital infrastructure.
DeepSeek-R1’s performance is comparable to leading OpenAI reasoning models for a variety of tasks, including mathematics, coding, and complex reasoning. For example, on the AIME 2024 math benchmark, DeepSeek-R1 scored 79.8% compared to OpenAI-o1’s 79.2%. On the MATH-500 benchmark, DeepSeek-R1 achieved 97.3% versus o1’s 96.4%. In coding tasks, DeepSeek-R1 achieved the 96.3th percentile on Codeforces, while the o1 achieved the 96.6th percentile – although it’s important to note that benchmark results can be imperfect and shouldn’t be over-interpreted.
But what is most remarkable is that DeepSeek was able to achieve this mainly through innovation and not by relying on the latest computer chips.
They introduced MLA (multi-head latent attention), which reduces memory usage to only 5-13% of the commonly used MHA (multi-head attention) architecture. MHA is a technique widely used in AI to process multiple streams of information simultaneously, but it is memory intensive.
To make their model even more efficient, DeepSeek created the DeepSeekMoESparse structure. “MoE” stands for Mixture-of-Experts, meaning that the model uses only a small subset of its components (or “experts”) for each task, rather than running the entire system. The “sparse” part refers to how only necessary experts are activated, saving computing power and reducing costs.
The DeepSeek-R1 architecture has 671 billion parameters, but only 37 billion are activated during operation, demonstrating incredible computational efficiency. The company has published a comprehensive technical report on GitHub, providing transparency into the model’s architecture and training process. The accompanying open-source code includes the model’s architecture, training pipeline, and related components, enabling researchers to fully understand and replicate its design.
These innovations allow DeepSeek’s model to be powerful and significantly more affordable than its competitors. This has already caused a price war of conclusions in China, which is likely to spread to the rest of the world.
DeepSeek charges a fraction of the cost of OpenAI-o1 for using the API. This dramatic reduction in costs could potentially democratize access to advanced AI capabilities, allowing smaller organizations and individual researchers to use powerful AI tools that were previously out of reach.
DeepSeek has also pioneered the distillation of its large model capabilities into smaller, more efficient models. These distilled models, ranging from 1.5B to 70B parameters, are also open source, providing the research community with powerful and efficient tools for further innovation.
By making their models freely available for commercial use, distillation and modification, DeepSeek is building goodwill within the global AI community and potentially setting new standards for transparency in AI development.
DeepSeek was founded by Liang Wenfeng, 40, one of China’s leading quantitative investors. His hedge fund, High-Flyer, funds the company’s AI research.
In a rare interview in China, DeepSeek founder Liang issued a warning about OpenAI: “In the face of disruptive technologies, closed-source moats are temporary. Even OpenAI’s closed-source approach can’t stop others from catching up.”
DeepSeek is part of a growing trend of Chinese companies contributing to the global open-source artificial intelligence movement, countering perceptions that China’s tech sector is largely focused on imitation rather than innovation.
In September, China’s Alibaba unveiled over 100 new open-source AI models as part of the Qwen 2.5 family, which support over 29 languages. Chinese search giant Baidu has the Ernie series, Zhipu AI has the GLM series, and MiniMax the MiniMax-01 family, all offering competitive performance at significantly lower costs compared to the US flagships.
As China continues to invest in and promote the development of open source AI while navigating the challenges posed by export controls, the global technology landscape is likely to see further changes in energy dynamics, cooperation patterns and trajectories. of innovation. The success of this strategy could position China as a leading force in shaping the future of AI, with far-reaching consequences for technological progress, economic competition, and geopolitical influence.