DeepSeek - How a Chinese AI Startup Shook Silicon Valley | Patrick Boyle On Finance podcast

00:00

Last week, Deepseek, a Chinese AI company, released a new reasoning model that turned out to be comparable to models made by firms like Open AI, Google, Meta, and Anthropic. The big difference is that Deepseek claims to have built their model at a fraction of the cost big tech firms have spent.

00:22

AUS export ban on Nvidia's best AI chips means that Deepseek has done what many thought was impossible possible building and training one of the most impressive AI models using the outdated chips available in China. The new model was announced in a white paper in December and

00:42

released a few weeks ago. Over the weekend, people started paying a lot of attention to how good the new model was, and on Monday when the stock market opened, the NASDAQ opened down around 3 1/2 percent, with NVIDIA declining by 17%. While 17% sounds like a big number, to put that in context, this was a $600 billion decline in market cap, which is more than the entire market cap of ExxonMobil, which not so long ago was the biggest company in the world.

01:16

While NVIDIA and the US mega cap tech companies got most of the news coverage, they weren't the only stocks hit. The realisation that a reasoning model could be built on such a tight budget raised doubts about the scale of spending we've seen from big tech. While big tech suddenly seemed less insulated from competition than they were previously believed to be, their price declines on Monday were quite

01:43

modest. The hardest hit stocks other than media were the ones expected to benefit most from the emerging data center economy. Utilities like Constellation Energy, who announced a few months ago that they were reopening 3 Mile Island to power Microsoft data centres, along with other electrical utilities near data center hotspots, all fell hard.

02:08

Just a few days earlier, the Stargate project had been announced with huge fanfare where plans for up to 20 large AI data centers were announced in the United States with an initial investment of $100 billion and plans for U to $500 billion by 2029. GE Vernova, an energy equipment manufacturer, Eton, a power management company, Oracle, who just announced a huge data center investment, and Broadcom who sell advanced networking equipment to data centers all

02:45

got hit in the sell off. Energy, commodities, copper and mining stocks were all hit too. There are a few things that you could read into the sell off 1 is just that investors now believe that less AI infrastructure will be needed, but the sell off could instead be telling us that the infrastructure just won't be as concentrated as was expected with a few huge data centres owned and run by the mega cap

03:12

tech companies. The emergence of Deepseek caused investors to question whether AI will be a winner takes all business model like a lot of tech innovation has been in the past, or if it's easily replicated and we'll instead see lots of different models run in smaller data centers all over the world. The idea up until about a week ago was that someone would achieve a huge lead in AIA, bit like Google did in search and own the market.

03:43

Google became the dominant search engine back in around 2002 and is still dominant today. The question is whether AI will workout like Search Oregon not.

03:55

The first AI reasoning model, known as O One, was released by Open AI last September. It was different to the prior models because it used a chain of thought approach to solving complex problems by breaking a big problem down to its constituent parts, then testing a number of approaches to solving each part in the background before presenting an answer along with the chain of logic that led to that answer to the user, it not only gave better answers, but users got to

04:28

see how the model thinks and decide if they agree with it or not. As soon as O One was released, competitors were rushing to catch up with Google releasing a competing model a few months later in December.

04:43

The thing is though, that Alibaba, the Chinese tech giant, had actually beaten Google by releasing their reasoning model called QWQ ahead of Google. Not only did Alibaba get there faster, but they published the model under an open license, meaning that anyone could dig through it to see how it works. This is very different to Open AI, who, despite the name of the company, keep the workings of their model secret.

05:13

So let's look at whether we should believe that Deepseek built their model for $5.6 million, what Chinese competition means for big tech, for NVIDIA, and for the future of AI. And is this a Sputnik moment? Deep Seek is an interesting AI company in that it isn't part of a huge tech firm, nor is it VC funded. It was originally part of a Chinese quantitative hedge fund called High Flyer, and was spun out as a separate unit in 2023.

05:47

Deepseek released a number of models since then, making the code open source under the MIT license, which puts very few restrictions on reuse, allowing users to modify the code even for proprietary commercial use. Despite its low cost, Deepseeks scores on AI performance benchmarks show that it's as good, if not better than the latest cutting edge models from

06:15

the top US firms. It's almost as good as Open AIS 01 model in the Artificial Analysis Quality Index, an independent AI analysis ranking and it beats Google Anthropic and Made As models. They released a large language model in December called V3 and then a reasoning model called R1 on the 20th of January, both of which got positive reviews in industry publications like Semi Analysis.

06:45

An Economist article a few days later on how China's AI labs were significantly better than anyone outside of China was giving them credit for got a lot of attention over the weekend, and by Monday morning people were questioning how necessary it was to have access to Nvidia's most expensive chips. NVIDIA gets to sell its H100 chips at a 1000% markup because of the belief that if you use the second best chip, you've no chance of ever catching up in AI.

07:21

The emergence of Deepseek changed the AI CapEx narrative. Being a Chinese model, Deepseek does appear to be heavily censored, avoiding topics that are considered politically sensitive for the government of China.

07:36

Users have, of course, had fun trying to trick it into discussing the Tiananmen Square massacre, the independence of Taiwan, and into making comparisons between Xi Jinping and Winnie the Pooh. This is not so different to the way that Grok appears to be hard coded to speak well of Elon Musk, praising his relatively slender build. Deepseek is not the only Chinese AI model.

08:04

Alibaba, Tencent, Byte Dance, and Moon Shot all have models that are slowly catching up with US peers, most importantly by beating them in cost efficiency. Because of the US export restrictions that were placed on advanced AI chips, Chinese AI companies were forced to innovate with more efficient algorithms, architecture, and training strategies.

08:31

According to the Deep Seq white paper, their model was trained using NVIDIA H 800 GPUs, which are similar to the H 100 but specifically tailored for the Chinese market to comply with US export restrictions. According to Reuters, the main thing NVIDIA changed in the H800 was that it reduced the chip to chip data transfer rate to around half that of the H100. In October 2023, the US government banned the export of

09:05

H8 hundreds as well. Despite having access to worse chips, Deepseek managed to complete training in just two months at a cost of $5.6 million, a fraction of the sums reportedly spent by Open AI, Google, and Meta.

09:23

Another reason that China was slow to develop AI chat models, according to The Economist, is that they worried about how sensors in China would react to models that might hallucinate and provide either incorrect information or come out with politically dangerous statements that could get the developers in trouble. The Chinese authorities eventually issued regulations to foster the AI industry and models started to be built, usually based on Meta's open

09:55

source Llama model. the US chip restrictions are likely responsible for the efficiency of deep seats model, which didn't come from one huge innovation, but instead from a series of small improvements which when combined made a massive difference. The Deep seq white paper explains a lot of the technical details, like how they used float 8 bit numbers instead of 16 to speed up training and save

10:23

memory. The problem with doing that is that you can lose a lot of of detail and so then they used other smart techniques to keep the training accurate. Deepseek used a mixture of experts model, which means that rather than training one large model, they trained 10s of smaller ones on more specific data that then get switched on or off as needed. A lot of their focus was on reducing communication overhead, both between nodes and within

10:54

nodes. The server farm was reconfigured to let individual chips speak to each other more efficiently. After Deepseeks LLM was trained, it was then fine-tuned on output from the reasoning model, learning how to mimic its quality at a lower cost. A lot of this reminds me of older coders who I've worked with who learned how to write software on much simpler computers. The capacity constraints meant that they wrote very efficient

11:25

code. They were often scornful of the bloated code written by younger programmers who never had to worry about efficiency. Chinese AI engineers faced with less efficient GP US focused on more efficient code and found smart ways of working around the constraints. Thanks to the efficiencies they found it cost around $56,000,000 to train the new model, or about 110th of what it cost Meta to

11:53

train their Llama model. That $5.6 million price tag has been getting a lot of attention, but if you read the technical document, this was just the cost of training, and Deep Seek are clear that this wasn't the overall cost of development. In order to reach the point of training the model, they had to spend possibly hundreds of millions of dollars working out how to get there and how to build the necessary

12:19

infrastructure. And once they knew what to do, they then spent 5.6 million, $1,000,000 on compute. So the overall cost was much higher, but still significantly lower than the amount being spent by major USAI companies. The thing is that now the deep seek have shown the way, these efficiencies will significantly reduce the cost to those who follow in their footsteps. But that still doesn't mean that you can do the same and build an advanced AI model with $6

12:52

million. Open AI are now saying that they have found evidence that Deepseek used their proprietary models to train Deepseek, having told the Financial Times that they had seen some evidence of distillation, which they suspect came from China. Distillation is a technique to get better performance on smaller models by using the outputs from larger ones, allowing them to achieve similar results on specific tasks at a much lower cost.

13:24

Many have pointed out that this is the pot calling the kettle black, as Open AI have already been accused of building ChatGPT by using online content that they didn't have the rights to. Open AI is in fact the subject of multiple lawsuits, including one from the New York Times, who claimed that Open AI built ChatGPT in part by downloading millions of their articles

13:49

without permission. People across China are, of course, cheering the success of Deep Seek and its founder, who have made this great achievement in the face of US tech restrictions. There are all sorts of memes doing the rounds of the shock waves that sent through Silicon Valley and Wall Street. As Angela Zhang wrote in the FT,

14:10

the inconvenient truth for U.S. policy makers is that strict export controls forced Chinese tech companies to become more self reliant, spurring breakthroughs that might not have occurred otherwise. She says that this episode lays bare the limits of technology sanctions, which may deliver short term disruptions, but their impact diminishes over time as other countries innovate and adapt. The rise of Deep Seek is a reminder that constraints can

14:42

sometimes fuel innovation. With unlimited access to money, Mehta has so far spent more on GP US than the US government spent on the entire Manhattan Project. When adjusted for inflation, Open AI has been burning through more than $5 billion per year and projected by 2029 they'll be spending almost $40 billion a year. The deep seek story, more than anything else, breaks the AI CapEx narrative, where the biggest firms need to fight for resources, which are mostly NVIDIA GPU's.

15:18

The belief was that the company that could spend the most was most likely to win the AI race. Jensen Wong of NVIDIA recently said on an earnings call that he expected the data center building frenzy to last at least U until the end of the decade. Up until now, Nvidia's looked like a money printing machine where they can sell their highest performing chips to the highest bidder at a massive markup.

15:47

The market has had really high expectations of NVIDIA, and NVIDIA has managed to surpass them both in terms of sales growth and profitability. Just last week, Sam Altman announced the Stargate project, where he secured a $500 billion commitment to building an AI data centre empire thanks to backing from SoftBank, Oracle and an Abu Dhabi government

16:14

fund. You have to imagine that the news about Deepseek made a few Silicon Valley investors nervous this week, as they've been piling money into AI at an unprecedented rate. It's worth no that no one got to see the prices of the firms most impacted by the announcement on Monday, like Open AI and Anthropic, as they are all privately held and not actively traded.

16:41

As someone pointed out on Twitter, all of Silicon Valley's next big things of the last 15 years like NFTS, Web 3, the metaverse, and virtual reality have been utterly rejected by the market, and now they're all in on generative AI and desperately need it to work. Monday will not have been a fun day in Silicon Valley. Despite the cheerful tweets that they published, Monday's shock by no means was an indication that investment in AI is drying up.

17:14

This Thursday was announced that SoftBank is in talks to invest as much as $25 billion into Open AI, and this is on top of the money they've already committed to Stargate. According to the FT, SoftBank could spend more than $40 billion on its partnership with Open AI. Elliott Management, on the other hand, wrote in a recent letter to investors that the artificial intelligence boom and high equity market valuation seen today are signs of investors acting like a crowd of sports betters.

17:50

So not everyone is a believer. Artificial General Intelligence, or AGI, is a type of artificial intelligence that matches or surpasses human cognitive capabilities across a wide range of tasks. This contrasts with narrow AI, which is limited to specific tasks. For the last few years, Big Tech has been warning us of the dangers of AGI while working as hard as they can to achieve it.

18:20

I've mostly been skeptical about both their warnings and their claims that they're close to achieving AGII think it's mostly a marketing scheme where they get a lot of attention by claiming to be on the verge of discovering something really dangerous, which might also be really profitable. We're told that these tech Bros are the only people who can be trusted with this dangerous technology when, based on the news, it's not clear that they can even be trusted with their

18:50

own system. Now that the AI race is heating up, it's not obvious that we'll hear a whole lot more about AI safety, especially if leading models are open source, widely available, and can be modified by users. Throughout Monday morning, Deepseek experienced outages which they said were caused by high traffic, and they temporarily limited registrations. Even still, it quickly became the most downloaded free app on Apple's App Store, overtaking ChatGPT.

19:25

As with other Chinese apps, U.S. politicians have been quick to raise security and privacy concerns, and both the US Navy and Congress banned employees from downloading the app on their phones. But luckily they still have TikTok so they should be fine. At present, there's no reason to expect Deep Seek to be the long term winner.

19:49

Firstly because it's too much of A security risk in countries that are worried about Chinese influence, but mostly because it's way too early to know who the overall winner will be or even how many winners they'll be. It's quite possible that in different parts of the world, different AI models will be used because countries just don't

20:10

trust each other's technology. If we go back to the late 1990s, it was very clear at the time that the Internet and e-commerce would be huge, but most of the big.com companies of the time have disappeared. If you'd bought all of the Internet companies in 1999, you would have owned winners like eBay and Amazon, but you would also have had a bunch of other companies that have since failed so that you would have been better off just buying a diversified index fund.

20:41

Despite being entirely right about the growth of the Internet, this is what makes technology investing so difficult. First mover advantage, which we all get excited about, often doesn't matter in the long run. The early search engines all fell into irrelevance when Google was released. There were companies like Myspace and Friendster, which were exactly the same as Facebook but didn't catch on as well.

21:09

Going back further, a ton of money went into railroad stocks in the 1840s and they almost all went bust after building out way too much infrastructure. There's no reason to think that any of the leading AI companies today will even be around in a decade. Most of them are burning money today with no real path to profitability.

21:32

Now, if you wanted to bet on Internet growth in the 1990s but thought it was safer to bet on infrastructure stocks rather than the riskier.com companies, you would have invested in companies like Cisco, Corning, JDS, Uniphase, and Loosened, none of which turned out to be great investments. You can be right about the growth of a company or a sector, but what you pay for a stock still matters.

21:59

And back then, some of these companies may have been great, but you were paying too much for growth that never came. Sun Microsystems was an infrastructure play in the late 90s. They marketed themselves asthe.in.com at the time to highlight their central role in the growth of the Internet. At its peak, the company was valued at 10 times revenues. The bubble popped in early 2000 and the Internet stocks all got crushed.

22:29

Mostofthe.com stocks were a website and a business plan with no earnings whatsoever, but the infrastructure companies were real businesses. In 2002 after the crash, Cott Mcneely, the cofounder of Sun Microsystems, gave an interview to Business Week where he asked what were investors thinking. He said at 10 times revenues, to give you a 10 year payback, I have to pay you 100% of revenues for 10 straight years in dividends. That assumes that I can get that

23:03

by my shareholders. That assumes I have 0 cost of goods sold, which is very hard for a computer company. That assumes 0 expenses, which is really hard with 39,000 employees. That assumes I pay no taxes, which is very hard. And that assumes that you pay no taxes on your dividends, which is kind of illegal. And that assumes with zero warranty for the next 10 years, I can maintain the current revenue run rate. Now, having thought through that, would any of you like to

23:37

buy my stock at $64? Do you realize how ridiculous those basic assumptions are? He asked. He went on to say you don't need any transparency. You don't need any footnotes. What were you thinking? Now NVIDIA is an amazing and hugely profitable company that transformed itself from a maker of video game graphic cards to the biggest company in the world over the last few years.

24:05

To quote Jim Reed at Deutsche Bank, it's gone from last 12 month earnings of around $4 billion two years ago to around $63 billion in the last quarterly release. He points out that for context, this is around half the total earnings made by listed stocks in each of the UK, Germany and France over the last 12 months. And while they're not really growing, NVIDIA is forecast to continue to see significant

24:36

earnings growth. While the stock fell 17% on Monday, which sounds like a big deal, that just took it back to its stock price from October, So not such a big deal for long term investors. The problem is, as Reed points out, that the AI industry is embryonic, and it's almost impossible to know how it will develop or what competition current winners might face, even if you fully believe in its potential to drive future productivity.

25:07

While Sun Microsystems was trading at 10 times revenues in 1999, NVIDIA is trading at 27 times revenues today, meaning it really is priced for perfection. All of the big tech stocks are very expensive today, but it's not the same as duringthe.com bubble in that the big US tech stocks are very profitable and have been responsible for most of the earnings growth in the S&P 500, and their growth has vaulted the United States ahead of the rest of the world. It's not just big tech that's

25:44

expensive though. All large cap U.S. stocks are expensive. Costco, the US retailer, has a higher PE ratio than Amazon, Microsoft, or Meta. Investors probably shouldn't be overly optimistic about further multiple expansion. The fact that Deepseek was able to build a reasoning model with Nvidia's older, slower chips suggests that the doors open to other competitors, not just in building AI models, but also in building chips.

26:17

It's worth noting that NVIDIA H100 chips aren't just used in data centres, they're also used to make handbags. But the handbags are subject to export control, so do be careful with that. If you look at some of the big tech stocks on Monday, when the market was panicking about Deepseek, you'll see that they didn't really decline very much. In fact, I think Apple was even up a bit.

26:43

Most of the big tech stocks are at or near their highs and deep seeks efficiency might actually be good for big tech as it means that they might not need to spend nearly as much on building their AI models as was previously expected. And it also means that we might not be in a winner takes all model like it was for Internet search, even if AI training no longer requires spending as much on NVIDIA chips as was being

27:12

planned for. It's not all bad news for NVIDIA as reasoning models which break down a problem and solve it step by step on the fly work quite differently to large language models. As reasoning models get better the more processing power you throw at them. This is a process called inference time compute. So you don't just need a big data center to train the models anymore, you also need one to run them, as the more processing power these models have access to, the smarter they get.

27:46

The same chips needed for training AI are also used in inference data centres now. Deepseek is more efficient at inference than the other models too, and can use the cheaper NVIDIA chips, but it's still smarter when it has more processing power, so there's still a reason to believe that there's plenty of demand for NVIDIA chips as long as the tech

28:08

doesn't change again. As a last point, as this video might be getting a bit too long, you've possibly heard a lot of people talking about Jevons Paradox this week, an economic idea that's mostly applied to energy usage. Jevin's paradox is that improvements in efficiency lead to more use of a resource, not less. A good example would be that as car engines have become more efficient over time, instead of us using less fuel, we've just got more powerful and bigger cars than ever before.

28:42

Another example would be the better home insulation in Europe led to people installing much bigger windows in new houses, offsetting the efficiency gains. The reason people are talking about Jevin's Paradox and AI is that Satya Nadella, the CEO of Microsoft, posted on Twitter in reaction to the deep sea model. Jevin's paradox strikes again. As AI gets more efficient and accessible, we'll see its use skyrocket, turning it into a commodity we just can't get enough of.

29:16

Now, whether this paradox applies to AI or not really depends on how much demand there is. If adoption is being held back by price, then efficiency gains should lead to greater use. I can see this being the case for businesses who have commercial uses for AI, and the cheaper it is, the more commercial uses they might find.

29:39

But today, most people who are using AI tools like ChatGPT are using the free versions, and so their usage isn't really being held back by the cost because they're not paying anything. So how much more are they likely to use it? I guess the question is whether AI will become a service we all value and pay for, or if it'll end up like e-mail where most people just want the basic or free version.

30:06

A recent US survey found that 80% of American businesses said that they don't use AI because it's either difficult to use or irrelevant to their line of business. That could, of course, all change, but that is the situation right now and it is nothing to do with the cost of the tech. We shouldn't overstate the market's reaction to Deep Seek. On Monday, the NASDAQ fell about 3%, which is a normal bad day and by no means a panic.

30:37

The appearance of Deep Seek isn't so much about whether China will catch up in AI or not. It's more about how easy it is to catch up and if any of these AI companies have a defensible mode. If the models are easily replicated, it'll be difficult to charge a lot for them and it just becomes a commodity business, like selling mobile phone contracts. One way or another, the AI bubble, if there is one, may still pop, but it didn't pop this week. Thanks for tuning into this week's podcast.

31:12

If you found it interesting, please send a link to a friend as there's no podcast algorithm and they just grow by word of mouth. Have a great day and talk to you again soon. Bye.

Transcript source: Provided by creator in RSS feed: download file

DeepSeek - How a Chinese AI Startup Shook Silicon Valley

Episode description

Transcript