#3: Extropic - Why Thermodynamic Computing is the Future of AI (PUBLIC DEBUT) - podcast episode cover

#3: Extropic - Why Thermodynamic Computing is the Future of AI (PUBLIC DEBUT)

First Principles with Christian Keil

Mar 12, 2024•1 hr 13 min•Season 1Ep. 3

--:--

--:--

Listen in podcast apps:

Episode description

Episode 3: Extropic is building a new kind of computer – not classical bits, nor quantum qubits, but a secret, more complex third thing. They call it a Thermodynamic Computer, and it might be many orders of magnitude more powerful than even the most powerful supercomputers today.

Check out their “litepaper” to learn more: https://www.extropic.ai/future.

======

(00:00) - Intro

(00:41) - Guillaume's Background

(02:40) - Trevor's Background

(04:02) - What is Extropic Building? High-Level Explanation

(07:07) - Frustrations with Quantum Computing and Noise

(10:08) - Scaling Digital Computers and Thermal Noise Challenges

(13:20) - How Digital Computers Run Sampling Algorithms Inefficiently

(17:27) - Limitations of Gaussian Distributions in ML

(20:12) - Why GPUs are Good at Deep Learning but Not Sampling

(23:05) - Extropic's Approach: Harnessing Noise with Thermodynamic Computers

(28:37) - Bounding the Noise: Not Too Noisy, Not Too Pristine

(31:10) - How Thermodynamic Computers Work: Inputs, Parameters, Outputs

(37:14) - No Quantum Coherence in Thermodynamic Computers

(41:37) - Gaining Confidence in the Idea Over Time

(44:49) - Using Superconductors and Scaling to Silicon

(47:53) - Thermodynamic Computing vs Neuromorphic Computing

(50:51) - Disrupting Computing and AI from First Principles

(52:52) - Early Applications in Low Data, Probabilistic Domains

(54:49) - Vast Potential for New Devices and Algorithms in AI's Early Days

(57:22) - Building the Next S-Curve to Extend Moore's Law for AI

(59:34) - The Meaning and Purpose Behind Extropic's Mission

(01:04:54) - Call for Talented Builders to Join Extropic

(01:09:34) - Putting Ideas Out There and Creating Value for the Universe

(01:11:35) - Conclusion and Wrap-Up

======

Links:

Christian Keil – https://twitter.com/pronounced_kyle
Guillaume Verd - https://twitter.com/GillVerd
Beff Jezos - https://twitter.com/BasedBeffJezos
Trevor McCourt - https://twitter.com/trevormccrt1

First Principles:

Gaussian Distribution: https://en.wikipedia.org/wiki/Normal_distribution
Energy-Based Models: https://en.wikipedia.org/wiki/Energy-based_model
Shannon’s Theorem: https://en.wikipedia.org/wiki/Noisy-channel_coding_theorem

======

Production and marketing by The Deep View (https://thedeepview.co). For inquiries about sponsoring the podcast, email [email protected]

======

Checkout the video version here → http://tinyurl.com/4fh497n9

🔔 Follow to stay updated with new uploads

Transcript

⁠¶ Intro

00:00

You can't escape thermal fluctuations. They just inevitably become significant. So in some sense, like, extropic is a little Why don't we just create physics-based computing systems that harness the noise from environments? To us, from first principles of mathematics, information theory, probability theory, physics, thermodynamics, this is the future. Hopefully this podcast is like the beginning of a new revolution.

⁠¶ Guillaume's Background

00:41

Alrighty, everybody. Welcome to an emergency episode of the First Principles podcast. We're coming to you on a Sunday night because we need to understand what the heck Extropic is building. They've just launched their light paper. Not a white paper, but a light paper. It is a great introduction to what they're doing and I've tried my best to dive into it, but I'm actually sort of at this perfect you know, in between points. Sometimes I know, like, basically everything about what a

01:06

company's building before they come on the podcast. In this case, I have lots of questions still. I actually don't totally understand what the hell you all are building. So I'm excited to learn. I'm excited for everybody to watch me learn. And I'm just going to throw as many dumb questions as I can out there. So welcome to the show, guys. Thanks so much, Christian. Excited to be here. Love your shows. Couldn't be more excited to share more with the world today about what we've been building

01:30

sort of in secret. This is just the beginning of us getting people excited about this thermodynamic paradigm of computing. And hopefully this podcast is like the beginning of a new revolution. I'm sure some of your listeners probably have seen one of my identities. flowing about online during daytime. I am Guillaume Verdon. I'm formerly a research scientist in quantum computing, co-founded the TensorFlow Quantum

01:52

Project with Trevor here. Back when we were in school, used to be a theoretical physicist at Perimeter, ended up being a pioneer of quantum machine learning, which is a field where You use quantum computers to do a form of physics-based AI to understand quantum mechanical matter around us, which

02:11

is, you know, that's my previous life. Now, essentially, I'm founder and CEO of Xtropic, and started this new physics-based computing paradigm and also happened to run a little thing called EAC to some extent, as much as one can run it or be very involved since the beginning. And that's a techno-optimist movement. And that's sort of the dual identities that many people are familiar with. But I'll let Trevor

⁠¶ Trevor's Background

02:41

give more of a bio here. I think people are pretty familiar with my Yeah, I'm definitely not as online as you. Basically, I'm an engineer who got swept into Guillaume's field. No, so I met Guillaume back when I was doing my mechanical engineering degree at Waterloo, which is clearly the best engineering school in North America. We love our Waterloo interns. I was doing a mechanical engineering degree. I mostly did manufacturing kind of stuff before

03:12

I met him. I worked at a little company called Formlabs, did some stuff with linear motors. Then I met Guillaume, and he was like, Trevor, do you want to work on quantum machine learning? And I was like, I have no idea what that is, but it sounds cool. So that proposal turned into the Google product that Guillaume was talking about, and then I got sucked into the quantum hardware physics

03:33

and engineering lab down in Santa Barbara. And I did a couple of years there working on device engineering, modeling, studying the effect of noise on quantum computers, calibration, control, pulse sequences, that kind of thing. And after that, went on to MIT for a bit and got a call from an old friend and had to come help him out at Xtropic. So yeah, I'm happy to be here. It's been Absolutely. You guys want to give us the 101, just the highest level. We're

⁠¶ What is Extropic Building? High-Level Explanation

04:02

going to dive super deep and feel free in this explanation to use a bunch of words that people might not know, because then later we'll dive in and try I mean, essentially, Trevor and I have had this career trying to build ways to program quantum mechanical computers, where you try to embed computational tasks into

04:21

quantum mechanical physics, right? Quantum is, we're going to dive into the contrast between quantum and thermo in a sec, but Quantum is really, you have things in superposition that the physics of the very, very small and the very, very, very, very cold, ideally as cold as possible, ideally zero temperature. And there you get to program the physics of matter, usually matter or light, and you learn to embed sort of programs that are parameterized, just

04:51

like neural networks. Neural networks have parameters that you train with all sorts of algorithms that usually use gradients. And that's kind of where we came from. We brought differentiable programming thinking to quantum computing. We were very early on on that. There was no software doing it at the time. And that was our project. And then in quantum computing hardware, there you have, the reality is that

05:17

you can't cool down a quantum computer to zero temperature, right? And so there's a mismatch between the program you want to run and the actual physics of the hardware. The program you want to run runs at zero temperature, theoretically, and the hardware has finite temperature. But what does having finite temperature mean? It just means

05:34

that things are jiggling. Things are unpredictable. There's entropy. There's uncertainty that gets injected in your system because your system interacts with the environment, and we call that noise, right? And so fighting noise has been the quest to scale quantum computing so far, and it's been the bane of the existence of many scientists. So Trevor's background was sort of at the very lowest level, how you make the quantum bits dance. Can you filter out noise? Can you deal

06:03

with noise? There at the lowest level, I was more involved at the algorithms and architectures level where In quantum computing, you try to do a process called quantum error correction, where you detect errors, detect these sort of injections of errors, and undo them, right? And you've got to keep track of how they spread in your computer. The problem is they're often, by trying to fix the problem, you make it worse. If

06:29

the thing trying to fix the problem adds more noise than was there before. And so, your quantum bits have to be of sufficient quality, they have to be low enough noise so that it's worth doing this error correcting process. And this error correcting process you can view as a form of refrigeration, right? Really, you're pumping entropy out, you're using energy to pump entropy out

06:50

of the system. And so we saw sort of the road ahead for quantum computing was very long, you know, reaching the level where you have a very large scale computer where you're below that threshold of noise where it's worth scaling up. There's a long road ahead for that. And

⁠¶ Frustrations with Quantum Computing and Noise

07:08

we sort of lost patience there. And we're like, well, if you can't beat them, join them, right? If you can't beat the noise, you should use it, right? And so we were thinking, Well, what if we could use the noise? These general AI algorithms, right? The parent concept is probabilistic machine learning algorithms. All these algorithms want to be probabilistic, right?

07:31

And so they want this sort of entropy and uncertainty present. And it turns out that even when we run things on digital computers that are nearly perfect, right? They're deterministic. we end up sprinkling in noise at sort of a very abstract level in our software later on, right? Not at the sort of analog hardware level. And so it seemed like we do all this effort, just like in quantum computing, we have all this effort to keep things pristine, right?

07:57

And in digital computing, You use a lot of power and energy so that your system is hard to disrupt. The noise of the environment is trivial compared to the amplitudes of the signals. And so there, things are very, relatively to the amplitude of the signals, not so noisy. But then at

08:16

the algorithmic level, you add the noise again. So we were like, why don't we just simplify that and just create physics-based computing systems that harness the noise from environments sort of above the sort of temperatures and noise levels of quantum computers, but noisier and lower power than deterministic computers. So it's kind of this in between, right? So we're trying to build a new paradigm of computing from the middle out in terms of scales. Had to

08:47

That's kind of a top-down explanation. There's also a bottom-up version that's pretty compelling. Yeah, go ahead. If you look at what it takes to keep making computational devices smaller, what you find, and it doesn't really matter what the device is, what you find is when you make it sufficiently small, you can't escape thermal fluctuations. they just inevitably become significant,

09:09

right? And so if we want to keep scaling computes smaller and smaller, it's actually inevitable that you have to go into this thermal or probabilistic regime, right? And this is becoming, you know, if you look at the data for like digital computer scaling, you can see that the rate of exponential growth in efficiency of computing technology is starting to slow down. And

09:34

that's because you're starting to hit some of these effects. There's a ton of reasons why it's hard to make transistors small, but a lot of them come down to the fact that these thermal fluctuations are starting to get really big. So I expect within the next several generations of transistor technology, you're

09:51

going to have to start looking at some of the things we are. So in some sense, Extropic is a little bit inevitable, and we're just trying to front run the danger Yeah, that's really interesting, because you've hit on two different types of computing. The top-down answer kind of came at it from the quantum angle, and

⁠¶ Scaling Digital Computers and Thermal Noise Challenges

10:10

then this bottom-up one that you just answered, those are just normal digital chips or whatever that we're talking about, just normal digital transistors. I would love to take this conversation sort of piece by piece. Maybe let's start with the first paradigm, which is just normal computing. talk about what are those chips, like how are they, you know, they're getting down to the nanometer, like single low digit, like two or three nanometer size

10:33

now. So let's talk about that. And then let's talk about quantum and then we can kind of use that to bridge over into thermal. But on the quantum, so on the classical note then, do you mind just telling us how these algorithms and, you know, all this, you know, neural network stuff is run today? Like, what are those chips? Like, what do they look like? How do they work? And So to start at the very, very low level, right, what

10:56

is a transistor? Yeah, exactly. A transistor is actually many things depending on what kind of voltages you put into it. But in the regime that digital computers operate today, transistors are switches, right? And you network these switches together to do digital logic. And so The mathematical abstract thing you're trying to do is Boolean logic, and the way you embed that in physics is by driving transistors really hard. And so that's how computers today work, right?

11:25

So you're taking these kind of inherently fuzzy devices, right? They're made out of real matter, so they're fuzzy. And you're applying very large signals to them so that they behave like this mathematically abstract object of Boolean logic that you want. Right? So that's kind of

11:43

how digital computers work. If you want to run, let's say, a sampling algorithm on a digital computer, which is what a lot of probabilistic algorithms come down to, that's kind of like one of the main subroutines, because the dynamics of your device are completely deterministic, right? They're operating in this kind of high signal regime where the natural fluctuations

12:06

of nature aren't important. you have to generate pseudo-randomness, which is instead of harvesting the noise of nature, you run a circuit that has really complex and uncomputable dynamics, right? And so you get kind of streams of bits that look random. And that process takes a lot of entropy, right? Because a random stream of bits is kind of like heat in the sense of connection between thermodynamics and information, and you're using electricity

12:34

to produce that heat, right? So it's like you're running an electric heater on your chip, literally, is the analogy to thermodynamics, right? And then, okay, so now you have a random bit stream that's not computationally useful unless you happen to want to do coin flips, right? So then what you have to do is you have to take that random bit stream and essentially filter it to get samples from the distribution that you're actually working with. right?

12:55

And that step of filtration also takes a ton of energy because now it's like you're taking this bowl of heat you have and you're putting it inside of a freezer to cool it back down a little bit, right? So the process of sampling on a digital computer is thermodynamically pretty similar to running an electric heater inside of a freezer to achieve some kind of intermediate temperature. So it's a little bit ridiculous, right?

13:17

When you look at it from that perspective, it's like, this doesn't make sense, but it's

⁠¶ How Digital Computers Run Sampling Algorithms Inefficiently

13:20

clear how we got here, right? Because digital computers are really nice and they're very easy to scale. So it's convenient to do things this way, but it's from first principles, it's not even close to the best ways.

13:34

Yeah. This approach to sampling is like so inefficient on digital computers that people, unless you're like on a Wall Street where things are super mission critical and you're willing to throw a ton of compute to get the best quality sort of uncertainty for your decision-making, unless you're on Wall Street, you end up trying to avoid sampling entirely, right? Because it's too costly on our deterministic devices. Again, as Trevor

13:58

mentioned, it's really unnatural for our deterministic devices to be probabilistic. And so another way, instead of sampling, to represent probability distributions is usually through deep learning. And what deep learning does is it usually starts with very sort of trivial randomness, like a Gaussian blob, a single Gaussian blob, and then it deforms that blob to shape it into the shape of the data. So it's a high-dimensional blob, and high-dimensional data could be like images,

14:24

text, whatnot. But it has to use many, many transformations to take that very simple randomness and transform it into the shape of the data. And very often, that sort of fails to capture the tail events, the tail distributions, a low data regime, right? When you're focused on like covering everything with one blog, essentially, you're just going to cover sort of the center of mass or like, of probability mass, like the typical data, right? You'd be focused on

14:51

that. And you're going to need like, more and more and more dimensions and more and more parameters in a deeper and deeper transformation that's more and more complex. So you need more parameters, more data, more compute in order to reach in that sort of low data regime in those tail events that are very rare, right? And so we've been seeing that with sort of self-driving cars.

15:12

In self-driving cars, we've just been throwing metric tons of data at the self-driving problem to reach a level that a human reaches with like 10 hours of driving classes, right? And there's clearly way more than 10 hours of data in all the data sets of all the players. And so that's sort of fundamentally the reason that current day deep learning is not

15:37

quite the end game. We think that this sort of probabilistic approach where you can use very little data and you can fill in the blanks with noise, with entropy and uncertainty, Right? If you don't know something, you don't have data, you should fill it out with uncertainty.

15:52

But that process of sort of painting everything with a noisy brush is very costly, because you got to sample, you got to like, you got to explore those parts of landscape, you got to kind of hallucinate everything that's not data, or, you know, within your model, within the scope of your model, and sort of penalize the things that are too far from data. And that sort of process of hallucinating all these possibilities and making those corrections, for

16:17

the technical folks, it's called contrastive learning. That process usually requires sampling, and that's very costly, so people avoid it. So they stick to these sort of taking these Gaussians and deforming them. That's how old school Old school neural nets like variational autoencoders work. It's somewhat how diffusion models work. Diffusion models kind of mix in the noise as you go to some extent, simple noise. But

16:43

that's kind of the common pattern essentially. So both from a sort of hardware standpoint, it's inevitable that we're gonna have to go stochastic because matter is jiggly and so your transistors are are technically jiggly, and so will the electrons hopping across it. And so it's going to get stochastic. And the algorithms want to go probabilistic to be more data efficient. And so that's why we're building the whole

17:07

stack. And we think it's going to be disruptive for everyone. And that's why we're really excited to sort of put our thesis out there of the future of AI, which is very contrary and very different, but if it succeeds, it changes everything, right? And so, at least to us, from first principles of mathematics, information theory, probability theory, physics, thermodynamics, this is the future. And

⁠¶ Limitations of Gaussian Distributions in ML

17:29

Yeah, basically. I love it. So to take just a tiny step back, can you talk about what is it that makes a GPU so good at doing that sort of estimation task, basically, of making it so that you have this really crazy distribution and GPUs do the deep learning approach, right? Because they suck at the sampling approach, right? So often people use CPUs for Monte Carlo sampling because

18:01

it's a very serial task. You gotta like have little walkers that travel. You're simulating a sort of particle in this landscape, whereas we use literal particles to do that job, right? So a GPU really got lucky, right? A GPU was not imagined from first principles to be a processor for AI. It was a graphics processor that did

18:23

really well with matrix multiplications. And it turns out that, you know, these transformations that I was talking about to morph a simple distribution into a complicated one, a lot of those transformations, the big computational element, are matmuls, matrix multiplications, right? And so GPUs are accelerators for that. And so most attempts that you've read in the news or over the past several years that have been trying to accelerate software for AI, they've been focusing on accelerating matrix

18:58

multiplication, which first of all, you're competing with NVIDIA. Good luck with that. Jensen will eat your lunch and thank you for it. But Trevor, you have some first principles reasons why you think And from the back of the envelope principle, you know, any sort of matrix multiplication accelerator has a fundamental bottleneck, and

19:18

it's not worth necessarily pushing in that area. It's much more interesting to try to disrupt how you do the entire algorithm rather than just a subroutine, right, Trev? Yeah, so the basic reasoning here is if you go into PyTorch or something and profile a neural network like a transformer, right, what you'll find is that you spend about 25% of your time

19:42

moving things in and out of memory, right? So what that means is if you accelerate the other 75% down to zero time using your fancy accelerator, maybe some kind of optical MatMul accelerator, right, that literally does the math of the speed of light, you still only have a 4x speedup because you're still paying the 25% of time to move things in and out of memory, right? So accelerating part of an algorithm only ever gets you kind of a modest speedup.

20:11

And so you do a lot better if you look at tasks that are much more

⁠¶ Why GPUs are Good at Deep Learning but Not Sampling

20:14

compute bound, like sampling. So that was kind of another reason we thought this Is there a reason that you can't do, so I'm, this is maybe skipping ahead, we haven't really talked about this yet, but you hinted at it when you said that, hey, this Gaussian, whatever, this like normal distribution thing isn't gonna be the answer for the future. Like you wanna do different types of probability distributions with your chips, right? And can you talk a little bit about why that is? Like what is

20:46

so wrong about this normal distribution? And then why can't we just do those other distributions with normal, like analog or Yeah, that's a great question, actually. We use Gaussian or normal distributions, right? It's basically what is known as a bell curve. We use those all the time because we can actually keep track, like fundamentally, what is a Gaussian? It's like a blob, there's where is it in space, and then how is it squished along which

21:16

axis, right? And by how much, right? So the squishing is a matrix called covariance matrix, and the position is called a mean, right? And if you have that vector in a matrix, you fully specify the distribution. So essentially, it's a way to cheat and have deterministic computers represent distributions, because they just need to store a matrix and a vector. And they're they have a proxy for distribution. And you can sort of analytically for many simple transformations, keep track of how the

21:47

Gaussian gets morphed, right? And these tricks are actually why diffusion models work so well, right? Diffusion models, they approximate every transformation as like a slight transformation of a Gaussian. And so essentially, it's kind of an artifact of them being some of the, I mean, obviously the simplest distributions you can come up with. And essentially being easily representable by a classical computer. If your computer can natively represent much more complicated distributions, we

22:19

wouldn't have that sort of bias, right? And the problem is, you know, there's distributions that have much longer tails, right? They're not just so concentrated around one mode. They have all sorts of, you know, blobs and long tails where, you know, a very, you know, very low likelihood event still has, like, a non-trivial probability, whereas Gaussians, as you get far away, you know, they get, like, more than exponentially low

22:49

probability as you get away from the mode. And so, you know, many machine learning algorithms and machine learning algorithms are really good at modeling the typical case Right. And we feel this with LLMs. Right. They're kind of like basic. Right. Like they're really good at like typical things, but like, it's like, I

⁠¶ Extropic's Approach: Harnessing Noise with Thermodynamic Computers

23:07

need this sort of like edge case. I need this sort of edgy thing. Like they can't, they can't go there with you. Right. But human brains can. Why is that? It's so weird. Right. Like, just like if we, if you're driving and you see something that's never been in a dataset on the road, you don't like glitch out. You like, you generalize. Right. And so.

23:27

Fundamentally, it's like the constraints of the hardware, deterministic hardware has constrained our thinking in terms of where the algorithms are going and where they should stay. And that has sort of held back AI. So something, you know, our ultimate goal here by proposing new hardware is to also disrupt how software and AI works and which algorithms tend to dominate and do well when But what is it about those algorithms that make it impossible or

24:02

impractical to model them using a classical computer? It seems like, I don't know, when I was a stats monitor, I could do a little binomial plot, you know what I mean? That's a non-Gaussian distribution. What I mean, if you try to sample directly from a hundred million dimensional distribution, right, you know, directly without using Well, the fundamental reason, right, is like, if I have, let's not go to a hundred million dimensions. Let's start with one and two.

24:36

If I have a one dimensional distribution, right, that's just a function in 1D. So I can slice that function up into n chunks and store those n chunks in memory, and now I have a representation of the distribution. Now I go to two dimensions. Instead of having n chunks, I have a grid of n squared chunks. So now I have to store n squared things in memory to represent the distribution

25:00

in generality. What if I go to d dimensions? If I have n slices along each dimension, I have a d-dimensional hypercube of things to store in memory, which grows really fast, right? So the general point here is that the complexity of representing a totally general probability distribution tends to grow exponentially

25:20

in the dimensionality of the system. Right. And so, um, and obviously there's like a lot of caveats that argument because the representation, uh, like the complexity of the distribution doesn't have to be exponential, but it can be. And that's kind of the key thing that makes this difficult on a classical computer. Um, you can't store them in memory. So you have to sample and sampling has all of

25:41

these problems I discussed earlier. So dude, you're just And so is this something that, was this thought, like this kind of train of thinking, was this what led people to want quantum computers in the first place? It's like we can represent these super high dimensional aspects of reality by just remaining high dimensional, by

26:04

Yes. Yeah. So that was a big, so back in our days in quantum computing, I would just keep hammering home Don't use a quantum computer for probabilistic machine learning or classical machine learning, as we call it, because quantum computers are really good at quantum interference, not necessarily probabilistic inference. Yes, you can. It's kind of like using a rocket that's on, you know, rockets are finicky and less reliable to ship something across town. It doesn't make sense,

26:35

right, intraday, right? It doesn't make sense. It is gonna blow up. You know, there's a chance it's gonna blow up. Like, why would you do that, right? Sure, like, in principle, it could go much faster, but, like, there's a chance it could blow up. So what we've seen is sort of, yes, on paper, a quantum computer can do slightly better for probabilistic inference. I've written a bunch of papers on this, because I

26:57

wanted to, like, rule it out properly, right? So I've spent the last eight years, I guess I put out my last paper in this space, a week ago for fun, because it was on my shelf for two years, but I thoroughly studied, can you do classical machine learning on a quantum computer? It seems to me like the main advantage is instead of having sort of jiggly particles that hop above sort

27:19

of barriers in landscape, you can tunnel through. So there's an advantage if your landscape has very thin barriers, because you

27:25

have a form of quantum tunneling, right? So sometimes, like in very special cases, you can find an optimum a bit faster, but when you do the whole systems thinking, the full stack thinking of like, okay, I have a quantum computer, I'm gonna have an error correcting system that's like 99% of the computer, 99% plus of the computer is the error correcting system, and I'm gonna have the cooling, and I'm gonna have the control systems, why the

27:51

heck am I going through all that trouble for this tiny speedup, right? So basically, it's not worth using a quantum computer for these sort of low order polynomial speedup, these sort of, hey, you know, like, I get a square root speedup, and it's still slow, it's still relatively slow, and I have to prop up this huge computer to do it, when you could do it just, you know, much cheaper on even a classical digital

28:16

computer. And so in our case, instead of trying to seek sort of asymptotic, what is called asymptotic speed ups, like in quantum computing, like there's, there's literally different complexity classes, if you have a quantum mechanical computer versus a classical computer, we're just, we're not trying to violate any sort of laws of complexity theory, we're just doing, you know, Classical algorithms, algorithms

⁠¶ Bounding the Noise: Not Too Noisy, Not Too Pristine

28:38

that you can simulate theoretically on a classical computer, we're just doing them way faster by a large, like sort of constant factor speed up, right? And that constant factor speed up is several orders of magnitude. sometimes more than can fit on one hand, depends on the algorithm. But before we pin down exactly what those speed ups are in the public, we want to put out some careful scientific works. So stay

29:09

tuned for that. But it's very substantial. It's enough that it's worth going through this exercise of rebuilding the whole stack from first principles, right? That seems like a huge change, right? We're taking a fork in the tech tree. We're forking off the root node. That seems like a huge effort. Is it worth the payout? We think so, at least from first principles. And so that's why we're really excited, you know, and that's why we're kind of, you know, we've been

29:34

very secretive. Unfortunately, I got As we know, I got doxxed in December. The plan was always to reveal more in March. And so here we are. So it's right on schedule. But our goal here is for people to be open minded about the future of AI. I know right now it just feels like the current labs doing LLMs, that's the end game's future. They've captured the market. It's over. You either work for one of these companies or you missed out, right? I don't think so. That's the beauty. of

30:04

disruption. That's the beauty of this sort of techno capital acceleration. A couple of crazy kids, you know, with one or two GPUs can have an idea that can, you Yeah, that drive, like the reason behind that makes a lot of sense. I think that the promise of quantum computers, at least the way that I understood it, was that eventually they're going to be so, they're going to get, you know, n squared number of operations in the same amount of bits or whatever. So

30:34

we're going from bits to qubits. And when we have qubits, like pretty soon we're going to have quantum supremacy because you can see like even the biggest classical computer will be so much smaller than this puny, you know, or even this very small quantum computer. But there are problems. There are things that it's not simply captured in the number of bits or qubits. There are other considerations that you have to have when building a quantum computer. And I imagine that you two probably have

30:59

very strong opinions about that. So I would love to ask you about that. Maybe Trevor, There. So for stars, it's funny you mentioned quantum supremacy. The way we achieved quantum supremacy back in the

⁠¶ How Thermodynamic Computers Work: Inputs, Parameters, Outputs

31:14

We were there. We were there a thousand years ago when it happened. Only Yeah. Back in the day. The problem that I have with quantum computing, and the main reason I stopped working on it, is because most of the phenomenon that are important to humans do not have long-range quantum coherent effects. So all atoms are governed by quantum mechanics, but things that are macroscopically observable, that involve a large number of atoms, don't

31:47

need to be simulated on a quantum computer, right? Like our classical models of them work really well. And so that's one of the fundamental reasons it's been difficult to find a practical advantage in

31:58

quantum computing, right? Like we have these kind of, you know, there's like a Shor's factoring algorithm, which is like the most common thing people tout that it's going to like break RSA and whatever, and it might, but we can just use a different crypto protocol that isn't broken by a discrete log and such. So it's very unclear, even if we had a big quantum computer, what you would do with it. And that, to me, kind of made it hard to

32:27

dedicate my life to it, right? Because the physics and engineering challenges involved in building quantum computers are extremely formidable. And after you do that for I, you could see that Trevor worked close to the metal where, you know, the challenges are extremely hard. And, you know, I was a theorist and an algorithmist, you know, a bit isolated away from the difficulties. I was aware of it

32:56

because I would talk to my neighbors and so on. But, you know, the ideal thing with quantum computers is that they can represent and sample from states of very high quantum complexity, right? So, if you have a state of very high complexity, but it's still, you could still sample from it with Monte Carlo, you could just run, again, a Monte Carlo algorithm, maybe it's a million times slower than doing it by nature, but it's still, you're still going to

33:22

get there. You know, you just throw a lot of compute at it, you're going to get there, you know, it scales linearly. The thing with quantum complexity is that it scales up in some cases super exponentially, like in terms of how much classical compute you need to use in

33:36

order to replicate that distribution. What was achieved in the Google quantum supremacy experiment and then later surpassed by Chinese simulations and then reiterated by Google quantum, so it's been kind of a little race there, but essentially it was just sampling from any sort of quantum program that you can't sample from with a classical computer, even if you were to throw most compute on earth towards

34:03

that end. And that was achieved, I would say, so I don't think there's anything stopping us from achieving that. I know it's still controversial, but essentially the promise there was that, okay, if we can show we can sample

34:19

from these complicated distributions, right? The narrative, at least for the quantum AI side, was that, okay, well, if we have these classes of distributions, maybe we could search over that space and represent very highly complex states in nature with these complicated distributions on our computer, right? And then map one to the other, and that unlocks the ability for us to sort of understand matter at a quantum mechanical level.

34:45

There, there was a lot of challenges to train such such distributions because when they get really complicated, they get really hard to train. It kind of is a sort of conservation of difficulty. So until the hardware gets much better, it's very hard. for you to use a quantum computer, even if you're trying to just generally model nature in sort of native fashion, right? You're trying to model quantum mechanics of nature with a quantum mechanical computer,

35:12

running a quantum mechanical AI representation. It's still difficult because if the computer's not reliable enough, you can't make it big enough, you can't run the easily trainable representations, and you're kind of screwed. And so from the algorithmic standpoint, it was also sort of doomed in the near term, I'm more of an optimist than Trevor. I think, you know, humans are really smart. I think on a 20 year time

35:33

scale, people are going to figure it out. But again, for us, it's like, okay, we're trying to do all these applications where it's not clear that you need quantum complexity, right? Really what you just want is a computer that allows you to do probabilistic machine learning and optimization very

35:53

cheaply, very energy efficiently, and very fast. And for us, it's like from first principles of thermodynamics of computing, it's not going to be a digital computer, it's deterministic, it's not going to be a quantum computer, it's going to be a thermodynamic computer that achieves that. And so that's what we got to build. And so that's why we left all the secret labs. You know, I was at Google Apps working for Sergey and Trevor was like a different black ops lab in Santa Barbara. And

36:18

then we joined forces. We both kind of left that. And now we're here. And now we have this thesis that we've kind of kept close to the chest. But, you know, now we're telling it to the world and we're asking if people want to join us. And Yeah. And one more point on quantum computing that's interesting in contrast to what we're doing at Extropic. To build a quantum computer, you have to build some really weird system at large

36:44

scale. So that might be superconducting circuits where you're making Joseph's injunctions, which are not new, but at least a relatively new object, you might be doing neutral atoms where you have to build these big arrays of optical tweezers and tables and tables of lasers. Trapped ions is very much the same thing. My point here is that the kind of manufacturing and supply chain for all of these things is extremely immature. And so there's

37:08

going to be decades of challenge just there, achieving scale, right? Versus if you build a thermodynamic computer, what you need to do that

⁠¶ No Quantum Coherence in Thermodynamic Computers

37:15

is a noisy circuit. And I can think of lots of ways to make noisy circuits that lean heavily on the way we know how to make circuits today, right? Like the whole semiconductor industry. So that's ultimately what we're chasing here is something that we can do, you know, in this decade, not several decades from Yeah, so that's a perfect tie-in. Let's just hop right in and start talking about what you guys basically announced in this light paper. I

37:44

mean, you mentioned Joseph's injunctions. That appears strongly in the light paper. We talked earlier about having to keep quantum computers extremely cold, and I believe that that also is true of this first wave that you've announced here too. I don't know, from a first blush, I would imagine some people would think, well, it sounds kind of familiar. What you just said is like, you know, hard to do. So what's the value there? We're starting within our neighborhood and we're taking a path

38:12

to sort of room temperature and large scale manufacturability, right? We're going from the bottom up. We're going from the very cold, using similar building blocks to what we're used to engineering and quantum computing and operating them in a thermodynamic regime where there's no more quantum coherence. There's no superpositions of states anymore. It's just fuzz, probabilistic fuzz over states. And that's where we're operating the devices, right? And for us, it was just our native

38:38

language. It was the first sort of concept of a programmable and parametric thermodynamic computer we thought of building. And that was basically our first prototype. And for us, there's a lot of learnings there of like, how do you even program this thing? How do you map all sorts of applications to it? What is programming gonna look like? How fast can it get, right? And showing the world, hey, this is how efficient you can have neural computing, computing for AI be and how fast it can be,

39:07

fast and efficient, speed and efficiency. It's very similar to the Tesla Roadster, super expensive, very exotic, had to import a bunch of parts from all sorts of suppliers, wasn't vertically integrated yet. And then that's a stepping stone towards a large scale mass production, in our case, eventually room temperature. chip that we're going to build. And we have a roadmap to that. And so for now, we're

39:32

just showcasing the world what's possible. Hey, you have this new paradigm that's coming, we have a first instance of it, but we have a roadmap to get to sort of having a thermodynamic Yeah, like in CMOS. So like, you know, the same way you make your digital computer that you're likely watching this on, we know how to make thermodynamic computers using the same manufacturing technology that operates at room temperature, which

40:00

So how do they work? What's the, you know, what's the, like, can you explain the 101 of what is Yeah, I mean, let's talk about, let's focus on the superconducting chip, that's the one we're disclosing, the CMOS stuff, you know, we're keeping on a high level for now. It's the same concept, a lot of the software maps over in thinking, but it's, you know, just like in quantum computers, you can have different substrates, right? There's optical ion trap, you know, photonic superconducting quantum

40:30

computers. There's many ways to do it through a computer. This is a first way. There's gonna be a better way later, but for now, we're talking about this one. So this one, Trevor's going to give you a much better, more technical explanation. But essentially, we're just using jiggles of electrons that happen in superconductors. In superconductors, electrons like to bundle up. They like to pair up. They're called Cooper pairs. And when they pair up, they can pass through each

40:55

other. That's why there's sort of no friction, you know, there's no traffic congestion for your electrons in superconductors. That's why they're superconducting, right? They have way less resistance. For us, the superconducting aspect is more to have a sort of non-linearity in the landscape. So that means not it being a simple Gaussian, right? So if you have a simple LC circuit like you do in high school, you know, and you add some, some

41:19

noise to it, you're going to get a Gaussian out of it. But we didn't want that. We wanted programmable. super general, fully general landscape. Essentially, what we do is something called energy-based models. I'm more on the software side. Trevor's going to give you more of the hardware side picture. But energy-based models

⁠¶ Gaining Confidence in the Idea Over Time

41:39

are models where you try to model data distributions as equilibrium states called Boltzmann distributions of certain parameterized landscapes. So essentially you shape some hills, right? We have little knobs that we could tweak and it changes the shape of some hills and we pour some sort of, you know, just a bucket of bouncy balls and keep shaking it as we go, right? And that's it, right? And then the algorithm is just changing this landscape over

42:12

time and the bouncy balls kind of flow. But, you know, on average, the probability mass of where your bouncy balls are. kind of changes and we guide those bouncy balls. For us, the bouncy balls are literally electrons, but theoretically, you can make it out of all sorts of other stuff. But in

42:29

our case, that's it. Essentially, we have a programmable probabilistic computer that has parameters that you can train in order to morph this sort of equilibrium distribution of the bouncy balls by morphing the sort of landscape in which they're dancing. And we have algorithms that are physics-based to tune that sort of landscape that correspond to machine learning, you know, like cross-entropy minimization, which is what transformers do and diffusion models do,

43:00

amongst others. And so essentially there's a connection between, you know, machine learning really is operationalizing information theory, information theory and entropy, right? The theory of entropy from Claude Shannon. appears in thermodynamics as well, right? So we're instantiating information theory as thermodynamic processes. And so that's the bridge between machine learning and

43:26

Trevor, do you want to go, Trev? Yeah. Sure. I mean, actually, you kind of have absorbed my talking points at this point, so that was pretty close to what I would say. I'll add a layer of generality. In a sense, any circuit you build experiences thermal noise, so that the charges that are moving around your circuit are getting battered around by vibrating atoms. So

43:52

every circuit you build works that way. The trick in designing something that's not just kind of noisy, but very noisy, is that you have to make sure that the noise is significant compared to the other energy scales in your device. Right, so that's kind of where the device physics and hardware engineering, more hardcore stuff

44:16

kind of comes in. But once you figure out how to get into that regime, basically all you need is some kind of circuit component that's tunable, that lets you kind of change where the electrons prefer to sit. And that gives you a programmable sampling machine, basically, right? So the principles at play here are pretty generic and you can imagine a ton of different ways to build it. And so we're just kind of thinking like, well, what's the most scalable thing we have? semiconductors.

⁠¶ Using Superconductors and Scaling to Silicon

44:51

So basically, I think the thing that is still confusing to me is like, what's the input and what's the output? So the input, as I understand it, is you're giving like weights or whatever to each of these things, each of these, what would you I mean, it's like a neural network, right? You have inputs, like data, and then you might have outputs, and then you have parameters. And those are So you have the parameters which you input, which are the ways

45:22

To be more concrete, I think that'll be helpful. You could think of like data and parameters as voltages, right? So I apply some voltage to the circuit, which changes how it behaves in some way, right? And that changes the distribution that the charges will follow, right? And when you want to take something out of the circuit, what you're doing is observing it. So the circuit will have a bunch of degrees of freedom that are kind of moving randomly under

45:53

the influence of thermal noise. And basically what I can do is I can hook an amplifier up to the circuit and measure one of those signals. And so doing that lets me observe the random dynamics of the circuit. And if I do that over and over again, as long as I leave a long enough time in between observations, In the bouncy ball analogy, right, you have your landscape, you poured a bunch of bouncy balls, still shaking a bit in this landscape, right?

46:22

Eventually it would equilibrate to some sort of distribution. Sampling is like applying a sort of porous grating on top and letting a bouncy ball sort of hop out. And from that, you can infer where that bouncy ball comes from. That gives you one sample, one bouncy ball from the probability mass

46:40

of where they all are, right? So that gives you one snapshot. If you take many snapshots, there's all sorts of algorithms that you can use those sort of what are called estimators of where the distribution is as a sort of signal, either for learning or inferring what values you're predicting, right? That position could be like the value of a pixel, an image, right? But you have probability distributions over everything, right?

47:13

Yeah. Interesting. So is that, so the translation from thermal land back to normal digital land, I assume still has to happen. Like you still, in order to show something on my screen, which is a digital screen, like I need to get those

47:27

values out. But that's what you're talking about right there. You're saying that whatever we're basically, you're able to sample and pull out these electrons or whatever they are, see where they fell, and then that gives you the value that you need for like a color or for a letter You're going from this cloud of bouncy balls to, okay, this one is definite, now I have it, it came from here, right? So, that's a deterministic sample out of a probabilistic sort of distribution, right? And

⁠¶ Thermodynamic Computing vs Neuromorphic Computing

47:54

so, and there's this old, Yeah, there's this old thought experiment called Maxwell's Demon that if you observe a probabilistic system, it's going to cost you energy to get that information. So for us, our goal, instead of having to sample a lot from the device and always have to relay things with classical computers, we're trying to do as much as possible natively in probabilistic physics because that's much lower power. It's going to cost less energy because observing things costs energy.

48:26

And that's sort of like what, so one of the things I understand is like wrong or whatever with quantum computers is that step basically. Like how do you get the thing out of the quantum, like qubit representation and put it into a normal, like classical bit. And so are there similar problems that you run into in this thermal world versus in the quantum world with that, like basically with the, you

48:48

know, that pulling out of the other regime into the normal regime? Like I imagine in this world to be more specific that you're, the thing reading the voltage or whatever could be itself noisy. And so you don't know whether you're actually getting the value that you intend to get in the translation step out of the thermal system. So that's where the real work comes in, right? Is how do I design these various circuits to

49:17

In quantum computing, it's called the readout problem, right? It's like I'm at the quantum regime where we're down to literally few quantums of energy, right? That's where the word quantum comes from. We're a few more energy packets than quantum. We're a bit higher energy than that. But still, for us, it becomes a problem of amplifying that signal, right? So ideally, you don't want to have to have your observations get off of your thermodynamic computer or your quantum computer

49:45

into a classical computer and then back. That's very slow. In fact, that's been a problem with most quantum computers today. If you try to use them for these sort of quantum deep learning algorithms, that iteration loop to optimize your parameters way too slow. Getting those samples out and then getting that feedback loop update way too slow. And so our insight is that eventually we want to do that as physics, as

50:08

a physical process in the device. And so, And basically, whenever you want a signal to travel far, you need to amplify it a lot, right? Because there's more, like when a single has to physically travel further through some like weird environment, right? More noise hits it. And so it's kind of interesting about our approach is you could imagine putting a lot of this stuff in the same package, right? It's a CMOS all the way down. So potentially we'll Huh. Wow. Okay. This is sort of breaking

50:48

my brain, but in like a good way. Like I, it's like, it's coming together.

⁠¶ Disrupting Computing and AI from First Principles

50:51

Like, I think, I think I'm picking up what you guys are putting down. Um, I, so I have a question about like, basically if there's an analog to like coherence here, like quantum coherence is obviously a big problem where you can, you did the thing just collapses and

51:06

that it like loses its quantum properties. Basically. Do you, does that happen to you guys when things are, um, like too big basically because there was a part of the light paper where you said we got to keep it small we got to keep it low power because then these crazy So we explicitly don't have quantum coherence, to be clear. Quantum effects are actually important in transistors. That's one of the things that limits how small you can make

51:34

them is quantum tunneling. But there's a difference between observing a quantum effect and having a coherent quantum state. Quantum tunneling in CMOS is not coherent because it's at room temperature. So

51:45

that's one tangent. I think the closest analogy we have to that is if you have a device that's too big in the right sense of big, you end up with technically still probabilistic, but I would say metastable systems in a sense that if I have two wells with a giant energy barrier between them, It's very unlikely that thermal noise is going to ever kick you over that barrier, right? So that thing is going to look more like a digital bit. And so you have to

52:16

So for us, it's like the opposite, right? So quantum coherence is like, it's like the time until your quantum computer starts to thermalize, until the noise starts seeping in and affecting things. We want the noise to start seeping in as fast as possible and for things to equilibrate as fast as possible. So we have something called the thermalization time. And we want that to be fast. So for us, it's actually the opposite. We want more noise and it helps us go faster in many ways. And so

52:44

that's the lesson we learned. Instead of trying to extend coherence times, it's like, hey, nature wants to thermalize. Let it rip, right? And so let's use that. Let's

⁠¶ Early Applications in Low Data, Probabilistic Domains

52:55

use that natural tendency as a building block for our algorithms. And so it's kind of dual. It's like the opposite of coherence time. It's like decoherence time. How fast can you decoherence? Yeah, exactly. So it's like switching sides, you know, half times.

53:12

Yeah. Very cool. So what I imagine that when you both started out on this, and maybe this is a good time to talk sort of about your backgrounds, but or like how you guys got to, we talked about a little bit at the beginning, but I'd be curious, like, I imagine at the very beginning of this project, it was like a glimmer of an idea. And you were like, okay, probably it's not gonna work, but like, maybe it will. And that will, how sweet would that be? But then now, you guys have so much further

53:36

down that path of like actually building some chips. Like I saw your little chip at that party, Gil. And like, it's so, you're so much further down the idea maze and like the, you know, you must have more confidence now. So I'm curious to hear, like how much more confident are you that this is actually going to We have a lot of confidence that the local neighborhood of ideas we're exploring here with sort of the intersection of probabilistic machine learning and stochastic electronics is

54:07

the future of computing for AI. We have a couple hypotheses of what that looks like, and we got a couple bets in that sort of local neighborhood. We're not even married to one substrate, as you can see. So even in terms of algorithms, we have a couple bets there. But we're pretty sure that something in this neighborhood is the future from first principles. And we've built that conviction from doing these investigations and having a larger team to sort of paralyze our learning over the

54:37

past year and a half or whatnot since the founding. For me, this idea was a super slow simmer over the past eight years. Well, eight years if you include time at Xtropic. And it was an idea that seemed so crazy that

⁠¶ Vast Potential for New Devices and Algorithms in AI's Early Days

54:53

I thought I had lost my mind or something. And I wanted to sanity check by working in quantum computing, like, hey, we're going to learn a lot about how to do physics-based computing and imbue AI into physical systems with quantum AI, and those learnings will bring to sort of this alternative form of computation. And so, you know, that's been a sort of slow exploration in the idea maze, like Backburner idea, but then I think I think the point of

55:22

going all in and burning the boats, right? Like, turning down every big tech job, selling everything, moving back home with the parents. Obviously, I don't love my parents anymore. But, you know, that was a big move, right? Swallowed a lot of my pride, but then I had a lot of skin in the game, right? It's like, I either make this work... You

55:38

only have one life, right? If you have an idea that you think is your greatest idea of your life, that you think is gonna have the most impact to helping civilization, you gotta go all in. And so, at that point, going all in, was what we needed. And then, you know, getting Trevor on board was a matter of time. Just had to convince him to drop out from MIT. That took a couple months. But, you know, and at that point, once he came

56:05

in, you know, things really accelerated. Because, you know, we've worked together before. We shipped TensorFlow Quantum. Again, that was a similar scenario. It's like, there is no adults in the room. There's no guidance. The field didn't exist. Big tech people were asking us where it's going and to build it. And the best way to predict the future is to invent it and build it. And that's what we did back then. And that's what we're doing now again. So Trevor, do

56:32

you want to add to that? I'll look for my chip. I think it's somewhere No, I think you pretty much covered that. I mean, personally, you know, I feel that computing has to go this way. I've, you know, I've been thinking about noise and computing and how they might help each other, how they harm each other for basically like my entire, you know, academic and adult life. So it's a topic

56:56

that's very near and dear to me. And when you kind of combine the theoretical angles with the fact that we want devices to be small, when things are small, thermal fluctuations are important, and therefore devices become noisy, it all seems kind of inevitable to me. And I really think what we're doing is inevitable. So in that sense, I have a lot of confidence. that

⁠¶ Building the Next S-Curve to Extend Moore's Law for AI

57:23

Totally. So I think that you mentioned the category or whatever, like this category of ideas seems right. Ooh, there we go. There's some hardware. A little metal There you go. Old chip. Yeah. There you go. It's almost as big as your people. That's wild. So this one's made out of aluminum. That's like the easiest process to start with. There's other fancy superconducting metals we can experiment with that can operate at a higher temperature. But for us, we

57:52

came from quantum computing. This was kind of our lingua franca. And we know how to build modular physics-based devices out of the substrate. So it was somewhere to start. But, you know, obviously we have a long road ahead because, you know, if you're an alien and you don't have earthly supply chains that are already established, you

58:10

build your thermodynamic computer out of this, right? But of course, for us to grow on a fast timescale, we got to meet in the middle where existing supply chains add and find sort of mineral ground. That's where we're going to silicon. I like to joke that this is the floor of computers aliens would build from first principles. But of course, if you have the deep pockets to scale up superconducting technology, you might be interested in this and you can give

58:39

us a call and we can work with you. Including the aliens, they're listening to this. Give us a call. But hey man, I don't judge. So yeah, essentially, yeah. So you mentioned the broad category that, you know, this idea is a part of seems like the right one. It's, you know, for physical reasons, for like, you know, like the frustrations you had in quantum reasons. are you guys are one of one? Are you guys the only people thinking about

59:09

this? Or I know that like neuromorphic computing is broadly a category, but I don't know if it's like somewhat applicable to what you guys are doing. Do you consider yourself part of that broader subfield or more I think people have been obsessed with sort of biomimicry, right? They're like, well, if we obsess over every details of how neurons actually work and we mimic that, something's going

⁠¶ The Meaning and Purpose Behind Extropic's Mission

59:35

to work, right? We're going to figure out how to program it later. Whereas for us, it's like, no, no, no, that's not how it works. You got to like start with the algorithm and then, you know, both top-down and bottom-up sort of engineer this bridge between what you want to do and the physics of the device. And that's what we're doing. We've established this sort of full stack bridge. And

59:52

that's so interdisciplinary. It's so difficult because you need to have like ML people talking to physicists, talking to compiler people, talking to hardware people. It's a very difficult effort, but we did it before in Give me a second here. A comment there. Computing has kind of started as this abstract thing where a

01:00:15

computer meant like a Boolean logic machine, right? But in the 21st century we're actually starting to see things go a different way in a sense that computing is just becoming kind of more widely understood as just embedding math into a physical process, right? This started to become really obvious in quantum computing because the way that people have been successful in quantum computing to date is you start with the physics of your device and you see what

01:00:44

kind of computations it does relatively naturally, right? Those are going to be the things that are going to be highest performing. And back at Google, like the things that we've gotten working on quantum computers to date are all kind of things that very naturally map to the physics of the qubits, right? Padram Rishan, Vadim Semyansky, that's the kind of game they

01:01:06

play there and it's been very successful. So I'm kind of taking that approach to computing and bringing it to like room temperature devices that scale, right? And ultimately what that's going to do is it's going to kind of hack the last, every last drop of performance out All right, yeah, what else do you guys want to talk about? Yeah, no, please. I had an analogy. So an analogy we like to use about sort of biomimicry versus what we're doing, right?

01:01:40

If you set out to build a flying machine, right, you're like, oh, well, you know, the proof of existence is out there, we have birds. right? Birds flap their wings. They use some form of physical principle we don't quite understand. Let's make a plane that flaps its wings, right? And that's going to be the device I make to achieve flight. On the other hand, you can sort of go up the supply chain

01:02:05

of nature itself, right, of biology. Ultimately, biology just finds a way to hack some sort of principle in physics to its own advantage. And so, in this case, it was like the physics of lift, right, or flight. And so, when you build an airplane, an artificial system, you just try to build the best system that leverages that physical principle that biology found a way to leverage, not obsessed with biomimicry. So, neuromorphic devices are obsessed with

01:02:32

biomimicry. We just understand how natural systems leveraged out-of-equilibrium thermodynamics in order to do probabilistic machine learning natively as a physical process, and we're building devices that are better than nature. Like, our neurons are, the superconducting chip, far more efficient than your brain. Right? Which is astounding because your brain is like tens of millions of times more

01:02:58

energy efficient than GPU clouds today and much denser. And we're going more denser and more energy efficient than the brain, which may scare some people. But, you know, to us, in order to be able to understand and predict our world at all scales, There's just so much intelligence needed for us to scale civilization that we just need to accelerate as fast as possible and reach the end We're trying

01:03:23

to reach the end of computing, right? We're trying to reach the ultimate substrate for intelligence in terms of energy and efficiency and density because that's where everyone is going And so we're like, let's just go there right away from first principles and see how far we can get. And I think we can get pretty far. And so that's what we're going after. So we're taking inspiration from

01:03:43

nature, but really we're just trying to hack physics directly. The What's interesting about this neuromorphic space, or neuromorphic or physics-inspired computing, whatever you want to call it, is that every different kind of device has its own kind of natural set of algorithms that it can accelerate, right? Because building a physics-based accelerator means you're embedding some kind of math that you want the answer to

01:04:12

in the dynamics of an analog system. Right? And so when you build dramatically different devices, maybe like a quantum computer is really good at solving the Schrodinger equation, like a memristor array type of thing is really good at simulating memristor arrays. Our computer is really good at sampling from programmable distribution. So in a sense, the point I'm trying to get at here is there's room for a lot of different plays in this space because every

01:04:40

accelerator ends up being good at something different. So in that sense, I don't think there's any real competition out What do you think the main applications are going to be just that fit

⁠¶ Call for Talented Builders to Join Extropic

01:04:54

today's world? So, you know, I imagine that you guys will invent some new stuff, some new software algorithms, your own software, but is there going to be a one-to-one analog for people that are doing normal models today that they'll be able to, oh, they'll just plug your thing in instead Yeah, we definitely want to support current day, you know, deep learning and machine learning practitioners. Of course, for us, those applications like large language models are

01:05:23

applications we achieve at scale, right? When our devices scale, because there's large in the name of large language models, right? And so there's a lot of machine learning models that are more valuable to businesses in some ways. that are in the low data regime, where you need to have probabilistic uncertainty about your predictions, right? Let's say you're, you're doing a trade, you're pricing options, or, you know, you're trying to optimize the manufacturing process, every

01:05:49

data point can cost millions, if not billions of dollars. you don't have that many data points, and it doesn't matter how much compute you want to throw at it. You want the best possible answer, and you want to quantify how uncertain you are about your predictions. That's the sort of algorithm we're trying to enable in the short term. And so, did I cut out there? No, I did not. Okay. I did that. So that's the sort of algorithm we want to enable. And so that's a different regime

01:06:19

than the big data, big compute regime or big classical compute regime. We're liking the extreme compute regime. We harness a lot of that compute from nature, right? The probabilistic compute for practically free just from heat from nature. But we're going to tackle sort of low data regime probabilistic algorithms, right? And we think that these are actually in some ways more useful than large language models for businesses. Or at least a nice sort of dual to them or another

01:06:55

type of machine learning that is synergistic. And so for us, sort of the LLM market is interesting. That's where we want to get to. But in the early days, it's going to be these sort of other algorithms that maybe are more popular in Let's just say that. Like how there's lots of room in device space for new things, I think there's lots of room in AI space for new things, right? It's easy to get caught up in what's most important today, but you have to take a second and

01:07:26

have some perspective. We're really early in AI. This is like the birthing years, right? And so the technology that, I mean, there's never going to be an ultimate technology. There's always going to be a new and better thing. But, you know, looking 10 years down the line, 20 years down the line, that stuff might have no resemblance to what we're doing today, right? This is the next S-curve. We think that current AI, it's scaling, it's very impressive. It's going to hit some

01:08:04

sort of saturation, might be the data bottleneck. I think definitely the energy bottleneck, right? You've got to move mountains. 7 trillion. 7 trillion, right? Just throw 7 trillion at it. It's going to fix everything. you know, we think there's a better way. We're already working on the next S-curve, right, after

01:08:20

this one. So it takes time to ramp up to the level where we're at the state of the art, but we know that, again, by 2030 or so, we're probably gonna hit a wall in terms of scaling down our current deterministic transistor technology because they're gonna hit this sort of thermodynamic regime. Their wobbliness

01:08:39

is gonna be a problem. We're building beyond Moore's wall. So we're gonna enable us to extend Moore's law in a sense, just for AI and probabilistic computing, not for general computing, but that's still great because that's where we want the extreme scale compute anyways. And so to us, this is the most important thing we could be working on in our lives. And every day we just wake up with insane levels of internal fire.

01:09:06

We're on a mission to save the world here, and we kind of hyperstition this, and we're kind of in a position where it is the case, and it's kind of surreal, of course, and there's many other things going on in our lives as well. Most people's priors is that if we had a successful movement, couldn't be successful in technology, but for me, I think this technology, you know, the cultural movement is great and I want more people to

⁠¶ Putting Ideas Out There and Creating Value for the Universe

01:09:36

join in on the optimism and to have life paths similar to ours, if possible, because then everybody would accelerate. But, you know, ultimately, for me, this mission is the one I'm like most passionate about. And, you know, I'm all it, right? This is the meaning and purpose of our lives. I truly believe that. And that just gives a deep sense of satisfaction working on this stuff every day and gives you near infinite

01:10:06

energy somehow. It just comes out of nowhere, right? Like if you have this infinite goal and you're making progress towards it, it feels really good. And so any other bump in the road, any challenge, it's just a temporary setback on this road to an amazing goal. And so couldn't be more excited to finally, talk more about it today. It feels very cathartic right now to talk

01:10:28

about it in a sort of public podcast. These have been like our internal secrets for a while on our thesis, but the reason we're showing more about

01:10:38

the world is so that people know. People know it's coming. And if you want to work on this sort of stuff, if you're a talented builder, if you're ready to run through walls to do this, you should give us a call So we got four new job postings, but really it's whoever wants to join us on this journey and believes in the mission now that we've kind of laid it out, you

01:11:05

know, should talk to us. And so, you know, our goal is for everyone to accelerate and, you know, in the ethos of, you know, some of our techno-optimistic thinking, we're putting our ideas out there. And hopefully the universe will reward us back for creating all this value, but we're going to keep going no matter what, because it's the most And by talk to us, he means apply to our job postings, because if you DM us on Twitter, it's very likely not

⁠¶ Conclusion and Wrap-Up

01:11:36

Yeah, I get a lot of DMs. Yeah, yeah, yeah, yeah. So ideally job posting. Yeah, so that's our goal today. I love it. Thanks so much, Christian. I Oh yeah, absolutely. I mean, you guys are the perfect people to have on it. I feel like you name-dropped first principles many times throughout the podcast, which I am very, very

01:11:58

happy to hear. Awesome. And I hope that people, so people should know if they didn't know this already, and it'll be linked in the show notes, but there is a paper we were talking about through this, so they've actually published some It's just a few ideas to get you thinking. It's Not yet. That's right. Well, they did have that conference in San Francisco yesterday that was like how to build a nuclear bomb. So who knows, maybe they're- Right. No,

01:12:27

thank you guys so much for joining. This was awesome. And who knows, maybe when you get your next chips or when you publish a full white paper, we'll have another one of these and go That sounds good. Sounds good. Thanks so much. Awesome. Thanks, Christian. Thanks, guys. All right. Cheers.

Transcript source: Provided by creator in RSS feed: download file

For the best experience, listen in Metacast app for iOS or Android

Open in Metacast