DeepSeek's $1.6B Development: Debunking the Affordability Myth

DeepSeek's new chatbot boasts an impressive introduction: "Hi, I was created so you can ask anything and get an answer that might even surprise you." This AI, a product of the Chinese startup DeepSeek, has quickly become a major player, even contributing to a significant drop in NVIDIA's stock price.

Image: ensigame.com

DeepSeek's success stems from its innovative architecture and training methods. Key technologies include:

Multi-token Prediction (MTP): Instead of predicting words individually, MTP forecasts multiple words simultaneously, boosting accuracy and efficiency.
Mixture of Experts (MoE): This architecture utilizes 256 neural networks in DeepSeek V3, activating eight for each token processing task, accelerating training and improving performance.
Multi-head Latent Attention (MLA): MLA repeatedly extracts key details from text fragments, ensuring crucial information isn't missed, leading to a more nuanced understanding of input data.

DeepSeek initially claimed a remarkably low training cost of just $6 million for DeepSeek V3, using only 2048 GPUs. However, SemiAnalysis revealed a far more extensive infrastructure: approximately 50,000 Nvidia Hopper GPUs (including 10,000 H800s, 10,000 H100s, and additional H20s) spread across multiple data centers, representing a total server investment of roughly $1.6 billion and operational expenses of approximately $944 million.

Image: ensigame.com

DeepSeek, a subsidiary of the Chinese hedge fund High-Flyer, owns its data centers, providing control over optimization and faster innovation implementation. This self-funded approach enhances flexibility and decision-making. The company attracts top talent, with some researchers earning over $1.3 million annually, primarily from Chinese universities.

Image: ensigame.com

The $6 million training cost claim appears to be a significant understatement, representing only pre-training GPU usage, excluding research, refinement, data processing, and infrastructure. DeepSeek's actual investment in AI development exceeds $500 million. Despite this, its lean structure allows for efficient innovation compared to larger, more bureaucratic companies.

Image: ensigame.com

DeepSeek's success highlights the potential of well-funded independent AI companies to compete with industry giants. While its "revolutionary budget" claim is exaggerated, its success is undeniably linked to substantial investment, technological breakthroughs, and a strong team. The contrast is stark when comparing training costs: DeepSeek's R1 cost $5 million, while ChatGPT-4 cost $100 million. However, it’s still cheaper than its competitors.