Mon. 01/27 – Why DeepSeek Has Stunned Silicon Valley (And Wall Street) - podcast episode cover

Mon. 01/27 – Why DeepSeek Has Stunned Silicon Valley (And Wall Street)

Jan 27, 202516 min
--:--
--:--
Listen in podcast apps:

Episode description

It’s one of those days where there’s only one story. Maybe you saw that tech stocks got obliterated today. I’m here to tell you why. It’s solely because of DeepSeek and Chinese AI tech generally. How this tech is making people think twice about the AI boom, what DeepSeek did that is different and how this could affect all of Silicon Valley.

Sponsors:


Links:

See Privacy Policy at https://art19.com/privacy and California Privacy Notice at https://art19.com/privacy#do-not-sell-my-info.

Transcript

Welcome to the Tech Meme Right Home for Monday, January 27th, 2025. I'm Brian McCullough. Today, it's one of those days where there's only one story. Maybe you saw that tech stocks got obliterated today. I'm here to tell you why. It's solely because of DeepSeek and Chinese AI tech generally. How this tech is making people think twice about the AI boom, what DeepSeek did that is different, and how this could affect all of Silicon Valley. Here's what you missed today in the world of tech.

Here's why the stock market is having a bit of a crash this morning. NVIDIA down more than 8%, Meta and Microsoft both down, ASML down almost 10%, Japanese chip companies falling, crypto also falling. It's all because of DeepSeek. We spoke about DeepSeek last month with Simon Willison. To sum it up most succinctly, DeepSeek was apparently able to train an AI model at 3% of the cost of cutting-edge models from the likes of OpenAI. So why do you need to buy 100,000 H100s from NVIDIA when maybe you only need 3,000?

See, a lot of the eye-popping CapEx spending from the likes of every tech player in the world was predicated on the idea of scale. The only way to get smarter AI was to throw more and more compute at it, which meant more and more GPUs and more and more data centers. I mean, this was the whole premise behind the Stargate announcement. But what this is making people think...

What if that is no longer true? Then all of this spending could be pulled back all at once and thus crash. But not only that, DeepSeek has jumped to the top of the App Store charts. It's suddenly seeing rapid adoption in the AI community. DeepSeek is Chinese tech, and not only that, it's open source tech, not proprietary. If this cheaper tech, which is open source, is just as good, then that would mean that the ginormous valuations for the likes of OpenAI and Anthropic and the rest might not be warranted, suggesting a bubble would pop.

in VC funding. While it remains to be seen, if DeepSeq will prove to be a viable, cheaper alternative in the long term, initial worries are centered on whether US tech giant's pricing power is being threatened and if their massive AI spending needs re-evaluation, said Jun Rongyip of IG Asia. That a small and efficient AI model emerged from China, which has been subject to escalating US trade sanctions on advanced NVIDIA chips, is also challenging the effectiveness of such measures, end quote.

Certainly U.S. tech players seem to be taking this seriously. Mark Andreessen called DeepSeek, quote, one of the most amazing and impressive breakthroughs. And Meta has reportedly set up four war rooms to analyze DeepSeek's tech, two focusing on how HighFlyer cut training costs and one on what data HighFlyer might have used. But back to the stock market fallout, quoting the Financial Times. It's DeepSeek for sure, said one Tokyo-based fund manager of The Selling on Monday, adding that investors were rapidly assessing whether hardware spending on AI could ultimately be a lot lower than

in current estimates. AI investment by large cap U.S. tech companies hit $224 billion last year, according to UBS, which expects the total to reach $280 billion this year. OpenAI and SoftBank announced last week a plan to invest $500 billion over the next four years in AI infrastructure, end quote. That's a ton of very stimulative spending in the economy that could, again, potentially dry up if the status quo is upended.

Again, who or what is DeepSeek? The single AI model that is crashing the stock market and roiling Silicon Valley. DeepSeek is a Chinese AI lab.

that started as a deep learning research branch of Chinese quant hedge fund Highflyer. They've released several different models, all of which seem to be just as capable as the highest-end AI models produced by the recent flurry of Western AI startups. Again, crucially, while all their models seem to be cutting edge, their costs in terms of money and compute needed to train their models is believed to be a fraction of what Western models cost. One model reportedly costs $6 million to train, as opposed to the hundreds of millions of dollars that has become table stakes for other AI tech.

Now, this has not been without controversy. The assumption is that these Chinese models, along with others from the likes of ByteDance, which have shown similar costs versus performance improvements, were able to make this breakthrough because U.S.-led export controls over GPUs and other technology may have spurred DeepSeq to innovate and release its models without the latest chips. In other words, they engineered their way around the roadblocks put up to slow them down, necessity being the mother of invention, or at least innovation around efficiency in this case.

though some have also suggested they might have copied the works of others. For example, DeepSeek V3 sometimes identifies itself as ChatGPT when asked which model it is, leading some to speculate that its training datasets may contain text generated by ChatGPT. There are also censorship concerns. DeepSeek's latest AI model, R1, seems to stick to Chinese government restrictions on sensitive topics like Tiananmen Square, Taiwan, and the treatment of Uyghurs in China. But with DeepSeek apps topping the app stores, the suggestion is that none of this may matter.

The AI community could naturally gravitate toward using models that are far cheaper to operate, quoting VentureBeat. The implications for enterprise AI strategies are profound. With reduced costs and open access, enterprises now have an alternative to costly proprietary models like open AIs. DeepSeq's release could democratize access to cutting-edge AI capabilities, enabling smaller organizations to compete effectively in the AI arms race, end quote.

Why is this having such an impact on people's assumptions? Let's use NVIDIA as the prime example of the potential implications here, quoting Jeffrey Emanuel.

Perhaps most devastating is DeepSeq's recent efficiency breakthrough, achieving comparable model performance at approximately 1 45th the compute cost. This suggests the entire industry has been massively over-provisioning compute resources. Combined with the emergence of more efficient inference architectures through chain-of-thought models, the aggregate demand for compute could be significantly lower than the current projections assume. The economics are compelling. When DeepSeq can match GPT-4-level performance while charging 95% less for API calls, it suggests either NVIDIA's customers

are burning cash unnecessarily or margins must come down dramatically. The fact that TSMC will manufacture competitive chips for any well-funded customer puts a natural ceiling on NVIDIA's architectural advantages. But more fundamentally, history shows that markets eventually find a way around artificial bottlenecks that generate super normal profits. Say goodbye to weak erections with Joy Mode's sexual performance booster.

This all-natural supplement is designed to enhance blood flow, giving you firmer, more reliable performance.

Joy Mode features just four simple proven ingredients. Unlike prescription options that involve doctor visits and managing refills, Joy Mode provides a straightforward effective solution. Simply mix a pack with water and feel the effects within 45 minutes. But Joy Mode isn't just about better sex. Daily use supports healthier blood vessels, boosts heart health, and enhances your athletic performance. Join over 200,000 men who trust Joy Mode to boost their performance without the side effects. Start a subscription and save up to 30%.

Keep your performance at its best without any interruptions. Take control with Joy Mode. Get a boost anytime, anywhere, and never miss a beat. If you are looking to take your game to the next level, visit tryjoymode.com and use code RIDE at checkout for 20% off single purchases and 30% off subscription orders. That's T-R-Y-J-O-Y-M-O-D-E dot com and use code RIDE for 20% off single purchases and 30% off subscription orders.

If you're a security or IT professional, you've got a mountain of assets to protect devices, applications, employee identities, plus the scary stuff outside your security stack like unmanaged devices, shadow IT apps, and non-employee identities. It's a lot. Fortunately, you can conquer those risks with 1Password Extended Access Management. You know I use 1Password personally.

But get this, 1Password's device trust solution blocks unsecured and unknown devices before they access your company's apps.

And don't worry, 1Password still protects against the biggest attack source, compromised credentials. Its industry-leading password manager helps employees create strong, unique logins for every app. Secure devices? Check. Secure credentials? Check. What about employee productivity? 1Password Extended Access Management empowers hybrid employees to join the security team with end-user remediation that teaches them how and why to fix security issues without needing help from IT. Go to 1Password.com slash ride to secure every app, device, and identity.

in the unmanaged ones. Right now, my listeners get a free two-week trial at 1password.com slash ride. That's 1password.com slash ride. But how exactly did DeepSeek outpace OpenAI and others at a fraction of the cost? First, open source, as we've been saying, but other details like, quoting VentureBeat.

With Monday's full release of R1 and the accompanying technical paper, the company revealed a surprising innovation, a deliberate departure from the conventional supervised fine-tuning, or SFT, process widely used in training large language models. SFT, a standard step in AI development, involves training models on curated datasets to teach step-by-step reasoning, often referred to as chain-of-thought, or COT. It is considered essential for improving reasoning capabilities. However, DeepSeek challenged this assumption.

by skipping SFT entirely, opting instead to rely on reinforcement learning, RL, to train the model. This bold move forced DeepSeek R1 to develop independent reasoning abilities, avoiding the brittleness often introduced by prescriptive datasets.

While some flaws emerge, leading the team to reintroduce a limited amount of SFT during the final stages of building the model, the results confirm the fundamental breakthrough. Reinforcement learning alone could drive substantial performance gains. Little is known about the company's exact approach, but it quickly open-sourced its models, and it's extremely likely that the company built upon the open projects produced by Meta, for example, the Llama model and ML library PyTorch.

To train its models, HiFlyer Quant secured over 10,000 NVIDIA GPUs before U.S. export restrictions and reportedly expanded to 50,000 GPUs through alternative supply routes despite trade barriers. This pales compared to leading AI labs like OpenAI, Google, and Anthropic, which operate with more than 500,000 GPUs each. The journey to DeepSeq R1's final iteration began with an intermediate model DeepSeq R1-0, which was trained using pure reinforcement learning.

By relying solely on RL, DeepSeq incentivized this model to think independently, rewarding both correct answers and the logical processes used to arrive at them. This approach led to an unexpected phenomenon. The model began allocating additional processing time to more complex problems, demonstrating an ability to prioritize tasks based on their difficulty. DeepSeq's researchers described this as an aha moment.

where the model itself identified and articulated novel solutions to challenging problems. This milestone underscored the power of reinforcement learning to unlock advanced reasoning capabilities without relying on traditional training methods like SFT, end quote.

And more from Jeffrey Emanuel, quote, DeepSeq has made profound advancements not just in model quality, but more importantly, in model training and inference efficiency. By being extremely close to the hardware and by layering together a handful of distinct, very clever optimizations, DeepSeq was able to train these incredible models using GPUs in a dramatically more efficient way. How in the world could this be possible? How could this little Chinese company completely upstage all the smartest minds at our leading AI labs?

which have 100 times more resources, headcount, payroll, capital, GPUs, etc. Wasn't China supposed to be crippled by Biden's restrictions on GPU exports? Well, the details are fairly technical, but we can at least describe them at a high level. It might have just turned out that the relative GPU processing poverty of DeepSeek was the critical ingredient to make them more creative and clever, necessity being the mother of invention at all.

A major innovation is their sophisticated mixed precision training framework that lets them use 8-bit floating point numbers, FP8, throughout the entire training process. Most Western labs train using full precision 32-bit numbers. This basically specifies the number of gradations possible in describing the output of an artificial neuron. 8-bits in FP8 lets you store a much wider range of numbers than you might expect. It's just not limited to 256 different equal size magnitudes like you'd get with regular integers, but instead uses clever math.

to store both very small and very large numbers, though naturally with less precision than you'd get with 32 bits. DeepSea cracked this problem by developing a clever system that breaks numbers into small tiles for activations and blocks for weights, and strategically uses high-precision calculations at key points in the network.

Unlike other labs that train in high precision and then compress later, losing some quality in the process, DeepSeq's native FP8 approach means they get the massive memory savings without compromising performance. When you're training across thousands of GPUs, this dramatic reduction in memory requirements per GPU translates into needing far fewer GPUs overall. Another major breakthrough is their multi-token prediction system. Most transformer-based LLM models do inference by predicting the next token, one token at a time.

DeepSeek figured out how to predict multiple tokens while maintaining the quality you'd get from single token prediction. Their approach achieves about 85-90% accuracy on these additional token predictions, which effectively doubles inference speed without sacrificing much quality.

The clever part is they maintain the complete causal chain of predictions, so the model isn't just guessing, it's making structured contextual predictions. The brilliant part is this compression is built directly into how the model learns. It's not some separate step they need to do, it's built directly into the end-to-end training pipeline. This means that the entire mechanism is

differentiable, and able to be trained directly using the standard optimizers. All this stuff works because these models are ultimately finding much lower dimensional representations of the underlying data than the so-called ambient dimensions, so it's wasteful to store the full KV indices, even though that is basically what everyone else does.

Not only do you end up wasting tons of space by storing way more numbers than you need, which gives a massive boost to the training memory footprint and efficiency, again, slashing the number of GPUs you need to train a world-class model, but it can actually end up improving model quality because it can act like a regulizer, forcing the model to pay attention to the truly important stuff instead of using the wasted capacity.

to fit to noise in the training data. So not only do you save a ton of memory, but the model might even perform better. At the very least, you don't get a massive hit to performance in exchange for the huge memory savings, which is generally the kind of trade-off you are faced with in AI training.

Another very smart thing they did is to use what is known as mixture of experts, or MOE, transformer architecture, but with key innovations around load balancing. As you might know, the size or capacity of an AI model is often measured in terms of the number of parameters the model contains. A parameter is just a number that stores some attribute of the model, either the weight or importance a particular artificial neuron has relative to another one, or the importance of a particular token, depending on its context and the attention mechanism, etc.

Meta's latest Llama 3 model comes in a few sizes, for example, a 1 billion parameter version, the smallest, a 70 billion parameter model, the most commonly deployed one, and even a massive 405 billion parameter model. This largest model is of limited utility for most users because you would need to have tens of thousands of dollars worth of GPUs in your computer just to run at tolerable speeds for inference, at least if you deployed it in the native full precision version.

Therefore, most of the real-world usage and excitement surrounding these open-source models is at the 8 billion parameter or highly quantized 70 billion parameter level, since that's what can fit in a consumer-grade NVIDIA 4090 GPU, which you can buy now for under $1,000.

So why does any of this matter? Well, in a sense, the parameter count and precision tells you something about how much raw information or data the model has stored internally. Note that I'm not talking about reasoning ability or the model's IQ, if you will. It turns out that models with even surprisingly modest parameter counts can show remarkable cognitive performance when it comes to solving complex logic problems, proving theorems in plain geometry, SAT math problems, etc., end quote.

Okay, look, as I said, this whole day is about DeepSeek, and here's more of why, quoting Axios. This could be an extinction-level event for venture capital firms that went all-in on foundational model companies, particularly if those companies haven't yet productized with wide distribution.

The quantums of capital are just so much more than anything VC has ever before dispersed based on what might be a suddenly stale thesis. If nanotech and Web3 were venture industry grenades, this could be a nuclear bomb. Investors I spoke to over the weekend aren't panicking, but they're clearly concerned, particularly that they could be taken so off guard. Don't be surprised if some deals in process get paused.

There's still a ton we don't know about DeepSeek, including if it really spent as little money as it claims. And obviously, there could be national security impediments for U.S. companies or consumers, given what we've seen with TikTok. But bottom line, the game has changed, end quote. And finally, let's end with Joe Weisenthal taking the contrarian view just a bit, i.e. maybe if AI is a race down to becoming a commodity, that could be a good thing, quote.

Suddenly, everyone is talking about Jevons Paradox. This is usually discussed with respect to energy markets. Basically, when you get more energy efficient, you don't use less of the energy source. You just use your efficiency gains to do new things, and demand keeps booming.

This is certainly the hope if you're an NVIDIA or any company that builds underlying AI infrastructure that everyone will use the deep seek breakthroughs and just race even faster with no effect on total demand for compute. We'll see. As I'm typing this, NVIDIA has opened down about 13%. Certainly, investors aren't taking much comfort in Jevons Paradox right now.

One of my favorite Tracy Alloway lines is that it's only a crisis when you can't throw money at the problem. COVID was a crisis because money alone wasn't enough to address it. The supply chain shocks were a crisis because money alone couldn't fix the problem. There's no guarantee here that just throwing more money at US tech companies will be enough to keep them competitive in AI, let alone chips, if it's perceived that they're falling behind. Human capital...

talent takes years and years to develop. Getting the incentives right is not something where you can snap your fingers overnight and make things happen. These are big, slow-moving things." Nothing more for you today. Talk to you tomorrow.

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.