DeepSeek’s real magic doesn’t have anything to do with AI

I know a technology event has really hit the mainstream when my brother-in-law asks me about it. “What’s this about some Chinese AI thing called DeepSeek?” he asked me recently with a quizzical look. I don’t think the AI technology aspect of DeepSeek was what sparked this question, since he doesn’t know anything (or care) about the details of AI. I think what probably triggered his interest was the same thing that triggered the interest of lots of non-tech types: the fact that news about DeepSeek’s AI advancements caused US stock markets to suddenly go into free fall. Nvidia — the chip-maker that is one of the most valuable stocks in the world — lost as much market value in a single day (~$600 billion) as the gross domestic product of a medium-sized country.

Was this justified? Not really. Suffice it to say that most traders and brokerage firms don’t exactly have a nuanced understanding of AI. Also, at its peak Nvidia was trading for about 50 times its projected earnings (it has been as high as 77 times recently), and about 30 times its projected revenue. Those are eye-popping numbers — by comparison, Apple trades for about 35 times its projected earnings and 10 times future revenue — and any time a stock is selling for that kind of valuation, even the slightest bump in the road will cause a massive selloff. Traders who invest in these kinds of stocks are a little like people who have drunk 45 cups of coffee — they are extremely nervous, and the finger that is perpetually hovering over the “sell” button is on a hair trigger.

I should point out up front that I’m not here to give you the technical nitty-gritty behind DeepSeek’s announcement, for two reasons: #1) I don’t really understand it on the kind of granular level that would make my comments worthwhile for those who do understand it, and #2) There are lots of other places you can find this sort of thing, including a great overview by Ben Thompson in his newsletter Stratechery. But for those of you who aren’t already experts in this area, the 10,000-foot view is that DeepSeek — a Chinese company run by Liang Wenfeng, who started an AI-powered hedge fund and then branched out into AI as a side hustle — built an engine that is competitive with or possibly better than leading LLMs like Claude and GPT-4, but at a fraction of the cost.

Note: This is a version of my Torment Nexus newsletter, which I send out via Ghost, the open-source publishing platform. You can see other issues and sign up here.

How small a fraction? That’s a little murky. Some of the initial reports said that DeepSeek trained its model for $6 million — a truly gobsmacking number, since Meta’s Llama (to take one example) cost about ten times that. However, DeepSeek itself admits that this was just the cost of traing its model, not the entire cost of buying all the hardware and ingesting all the data required to make it effective (which OpenAI is claiming was partially poached from its output — something many find hugely ironic, given the lawsuits against OpenAI for effectively doing the exact same thing). Some estimates are that DeepSeek spent about $500 million for the hardware and other costs — which is still probably an order of magnitude lower than some companies have spent.

Note: In case you are a first-time reader, or you forgot that you signed up for this newsletter, this is The Torment Nexus. You can find out more about me and this newsletter in this post. This newsletter survives solely on your contributions, so please sign up for a paying subscription or visit my Patreon, which you can find here. I also publish a daily email newsletter of odd or interesting links called When The Going Gets Weird, which is here.

This is an open source story

For me at least, the cost of DeepSeek’s AI engine is not the most interesting aspect of the story. Yes, the company somehow did what Claude and GPT-4 and others have done, but a lot cheaper, and that is definitely interesting. It widens the competitive landscape, which is good in the long run. But to me, the most interesting thing is that DeepSeek makes almost all of its work open source — anyone can download the model, and read the research papers that Wenfeng and his team have written about its development. HuggingFace, the French open-source repository for AI, said it already has 500 models available that were based on DeepSeek’s. In short, DeepSeek has done exactly what OpenAI said it was going to do when it was founded, but never actually did.

OpenAI made a lot of promises when it was created by Sam Altman and a small team, including Elon Musk. The whole idea of making research into AI open and available to everyone helped give the company its name — a name that seems cruelly ironic now. The company was also supposed to be a nonprofit originally, but eventually decided that building an AI engine costs too much money, and that it had to create a structure with a nonprofit at the center, surrounded by a for-profit entity that could raise the billions of dollars required to fund AI development. After surviving an attempted coup by OpenAI’s other founders and board members in 2023, Altman is now in the process of dismantling that structure to become a for-profit company, and Musk is suing him and OpenAI for not following through on its promise (they say he wanted to take control of it, and that he agreed it needed to be a for-profit).

There are many reasons why companies decide to release their work as open source, rather than trying to control every aspect of it. Altman says that one of the reasons he chose not to make OpenAI actually open is that he was concerned about the risks of allowing anyone to pursue AI research because they might create something that would destroy humanity. The company wrote that it was “aware that some researchers have the technical capacity to reproduce and open source our results,” but that keeping them secret would give the AI community more time to have a discussion about the implications of such systems.” In other words, OpenAI isn’t open for our own good! That’s a nice story, and there might even be some truth to it, but it’s also true that controlling a technology often makes it a lot easier to cash in on that technology, which is what Altman and OpenAI appear to be in the process of doing as quickly as possible.

China didn’t win — open did

Some of those who release their work as open source do so because they are the underdog in an industry, and being open allows them to compete with larger companies. This is likely part (if not all) of the reason that Meta decided to open-source its Llama AI engine when it first released it in August last year — OpenAI and Microsoft and others such as Anthropic were clearly already ahead, so trying to develop a proprietary engine and catch up was a non-starter. But releasing Llama as open source gave it a leg up, by making it easier for developers and others to use it and build on it. In fact, this decision appears to have benefited DeepSeek as well, since its AI engine was reportedly trained in part on the outputs from Meta’s model, as well as OpenAI’s and others.

I should note that open-source purists will argue that neither DeepSeek nor Llama are technically truly open source, because while anyone can download their models, and read all of the technical documentation, and use the models to make other things, no one can recreate the DeepSeek or Llama models because their respective owners haven’t provided the data set that they used to train their engines (they provide what are known as the “weights,” or values assigned to specific types of data, but not the data itself).

While it’s possible that Wenfeng decided to make DeepSeek open source for mercenary reasons, that’s not the way he describes it. According to the Wall Street Journal, he says he wanted DeepSeek to break the monopoly of big tech companies. “For technologists, having others follow your work gives a great sense of accomplishment,” Wenfeng said in an interview last year (transcript here). “Open source is more of a cultural behavior rather than a commercial one, and contributing to it earns us respect.” But Wenfeng said this approach also creates a competitive advantage in a different way: “Even OpenAI’s closed source approach can’t prevent others from catching up,” he said. “So we anchor our value in our team — our colleagues grow through this process, accumulate know-how, and form an organization and culture capable of innovation.”

Yann LeCun, one of the pioneers of modern artificial intelligence and also the chief AI scientist at Meta, responded to some of the hysteria around DeepSeek being a Chinese company by noting that the real message is not that China’s AI is surpassing that of the US, but rather that “open source models are surpassing proprietary ones.” LeCun said in a series of posts on Threads and on his LinkedIn account that DeepSeek has “profited from open research and open source … they came up with new ideas and built them on top of other people’s work. Because their work is published and open source, everyone can profit from it. That is the power of open research and open source.” Marc Andreessen, co-founder of the Silicon Valley VC firm Andreessen Horowitz, posted on X that Deepseek’s R1 engine is “one of the most amazing and impressive breakthroughs I’ve ever seen — and as open source, a profound gift to the world.”

Come get some irony

I think it’s also worth noting that DeepSeek’s success contains a big helping of irony. Part of the reason Nvidia’s share price got hit so hard when the DeepSeek news came out is that the Wenfeng and his small team managed to build a competitive AI engine without access to the latest and greatest (and most expensive) Nvidia chips — which everyone has assumed would be required for any kind of AI advancement, hence the billions of dollars that OpenAI and others have raised to fund their work, and the US export controls on selling those chips to foreign adversaries like China. But DeepSeek didn’t need them to create what it did, nor did it need Nvidia’s proprietary control software, known as CUDA, which strong AI proponents have also assumed was mandatory.

If you’re interested in the technical details of what this meant for DeepSeek and how it got around them, I encourage you to read Ben Thompson’s overview in Stratechery. My reading of it is that because of the restrictions it was forced to operate under, Deep Seek had to think more creatively about the problem, and as a result developed different ways of making their engine work, and some of those resulted in advancements in AI training technology. And because its work is open source, OpenAI and other companies can now make use of those same advancements (I don’t have room to get into it here, but this is part of the reason why some believe the Nvidia selloff was overdone). Of course, whatever the US giants come up with won’t be shared.

As Thompson put it: “Somehow we’ve ended up in a situation where the leading U.S. labs, including OpenAI and Anthropic (which hasn’t yet released a reasoning model), are keeping everything close to their vest, while this Chinese lab is probably the world leader in open source models, and certainly in open-source reasoning models.” How things will shake out in the future for DeepSeek, or for the pursuit of human-like AGI (artificial general intelligence) as a whole remains to be seen, but it’s refreshing to see anyone — even a Chinese company — stirring up what was looking a lot like an AI oligarchy, controlled by a few mega-corporations for their own interests.

Got any thoughts or comments? Feel free to either leave them here, or post them on Substack or on my website, or you can also reach me on Twitter, Threads, BlueSky or Mastodon. And thanks for being a reader.

DeepSeek’s real magic doesn’t have anything to do with AI

This is an open source story

China didn’t win — open did

Come get some irony

Related

Leave a Reply Cancel reply

This is an open source story

China didn’t win — open did

Come get some irony

Share this:

Related

Leave a Reply Cancel reply