DeepSeek AI's Low-Cost Models Suspected to Use OpenAI Data, Sparks Irony Online

The emergence of DeepSeek AI models from China has sparked intense debate and concern in the U.S. tech industry, particularly after Donald Trump labeled it a "wake-up call." The introduction of DeepSeek's R1 model, touted as a significantly cheaper alternative to western AI offerings like ChatGPT, has led to a dramatic $600 billion drop in Nvidia's market value. Nvidia, a dominant force in the GPU market critical for AI model operations, saw its shares plummet by 16.86%—the largest single-day loss in Wall Street history. Other tech giants like Microsoft, Meta Platforms, and Google's parent company Alphabet experienced declines ranging from 2.1% to 4.2%, while AI server manufacturer Dell Technologies dropped by 8.7%.

DeepSeek's R1 model, built on the open-source DeepSeek-V3, reportedly requires less computing power and was trained at a fraction of the cost, estimated at just $6 million. This has raised questions about the massive investments U.S. companies are making in AI, causing investor jitters. DeepSeek's model quickly became the most downloaded free app in the U.S., fueled by discussions about its cost-effectiveness and capabilities.

Amid these developments, Bloomberg reported that OpenAI and Microsoft are investigating whether DeepSeek used OpenAI's API to integrate OpenAI's AI models into its own, a practice known as distillation. This technique involves extracting data from larger models to train smaller ones, which violates OpenAI's terms of service. OpenAI emphasized its commitment to protecting its intellectual property and working with the U.S. government to safeguard advanced AI models from adversarial use.

David Sacks, President Trump's AI czar, highlighted the issue on Fox News, suggesting that U.S. AI companies would take steps to prevent such distillation in the coming months. The irony of the situation was not lost on observers, with tech PR and writer Ed Zitron pointing out OpenAI's own history of using copyrighted internet content to train ChatGPT.

In January 2024, OpenAI acknowledged the necessity of using copyrighted materials to train large language models like ChatGPT. In a submission to the UK's House of Lords communications and digital select committee, OpenAI argued that excluding copyrighted materials would severely limit the capabilities of modern AI systems. This stance has fueled ongoing debates about the ethics and legality of training AI on copyrighted content, highlighted by lawsuits such as the one filed by The New York Times against OpenAI and Microsoft in December 2023 for "unlawful use" of its content. OpenAI responded by defending the practice as "fair use" and asserting the meritlessness of the lawsuit.

The controversy extends beyond news organizations, with a lawsuit from 17 authors, including George R. R. Martin, filed in September 2023, alleging "systematic theft on a mass scale." Additionally, a U.S. Copyright Office ruling upheld by District Judge Beryl Howell in August of the previous year stated that AI-generated art cannot be copyrighted, emphasizing the importance of human creativity in copyright law.

DeepSeek is accused of using OpenAI’s model to train its competitor using distillation. Image credit: Andrey Rudakov/Bloomberg via Getty Images.