Jonathan K Kummerfeld
The knee-jerk reaction to the release of Chinese company DeepSeek’s AI chatbot mistakenly assumes it gives China an enduring lead in artificial intelligence development and misses key ways it could drive demand for AI hardware. The DeepSeek model was unveiled at the end of January, offering an AI chatbot competitive with the US’s OpenAI’s leading model, o1, which drives ChatGPT.
DeepSeek’s model offered major advances in the way it uses hardware, including using far fewer and less powerful chips than other models, and in its learning efficiency, making it much cheaper to create.
The announcement dominated the international media cycle and commentators frequently suggested that the arrival of DeepSeek would dramatically cut demand for AI chips.
This dramatic reaction misses four ways DeepSeek’s innovation could actually expand demand for AI hardware: By cutting the resources needed to train a model, more companies will be able to train models for their own needs and avoid paying a premium for access to the big tech models.
Also Read
The big tech companies could combine the more efficient training with larger resources to further improve performance.
Researchers will be able to expand the number of experiments they do without needing more resources.
OpenAI and other leading model providers could expand their range of models, switching from one generic model — essentially a jack-of-all-trades like we have now — to a variety of more specialised models, for example one optimised for scientists versus another made for writers.
Researchers around the world have been exploring ways to improve the performance of AI models. Innovations in the core ideas are widely published, allowing researchers to build on each other’s work.
DeepSeek has brought together and extended a range of ideas, with the key advances in hardware and the way learning works.
DeepSeek uses the hardware more efficiently. When training these large models, so many computers are involved that communication between them can become a bottleneck.
Computers sit idle, wasting time while waiting for communication. DeepSeek developed new ways to do calculations and communication at the same time, avoiding downtime. It has also brought innovation to how learning works. All large language models today have three phases of learning.
First, the language model learns from vast amounts of text, attempting to predict the next word and getting updated if it makes a mistake. It then learns from a much smaller set of specific examples that enables the large language model to be able to communicate with users conversationally. Finally, the language model learns by generating output, being judged, and adjusting in response.
In the last phase, there is no single correct answer in each step of learning. Instead, the model is learning that one output is better or worse than another.
DeepSeek's method compares a large set of outputs in the last phase of learning, which is effective enough to allow the second and third stages to be much shorter and achieve the same results.
Combined, these improvements dramatically improve efficiency.

)