Pushkin.
If I were going to pick one paper from the past decade that had the biggest impact on the world, I would choose one called Attention Is All You Need, published in twenty seventeen. That paper basically invented transformer models. You've almost certainly used a transformer model if you have used chat GPT or Gemini or Claude or deep Seek.
In fact, the tea in chat GPT stands for a transformer, and transformer models have turned out to be wildly useful, not just at generating language, but also at everything from generating images to predicting what proteins will look like.
In fact, transformers.
Are so ubiquitous and so powerful that it's easy to forget that some guy just thought them up.
But in fact, some guy did.
Just think up transform and I'm talking to him today on the show. I'm Jacob Goldstein and this is What's Your Problem, the show where I talk to people who are trying to make technological progress. My guest today is Yakub Uskolai. And just to be clear, Yakub was one of several co authors on that transformer paper, and on top of that, lots of other researchers were working on related things at the same time, so a lot of people were working on this, but the key idea did
seem to come from Yakub. Today, Yakub is the CEO of Inceptive. That's a company that he co founded to use AI to develop new kinds of medicine, and the company is particularly focused on RNA. We talked about his work at Inceptive in the second part of our conversation. In the first part, we talked about his work on transformer models. At the time he started working on the idea for transformers, this is around a decade ago now, there were a couple of big problems with existing language models.
For one thing, they were slow. They were in fact so slow that they could not even keep up with all the new training data that was becoming available. A second problem, they struggled with what are called long range dependencies. Basically in language, that's relationships between words that are far apart from each other in a sentence. So to start, I asked Yakab for an example we could use to discuss these problems and also how he came up with
his big idea for how to solve them. So, pick a sentence that's going to be a good object lesson for us.
Okay, so we could have the frog didn't cross the road because it was too tired. Okay, so we got our sentence. Yep.
How would the sort of big, powerful but slow to train algorithm in twenty fifteen.
Have processed that sentence? So basically it would have walked through that sentence word by word, and so it would walk through the sentence left to right. The frog did not cross the road because it was too tired.
Which is logical, which is how I would think a system would work.
It's more or less how we read, right, it's how we read, but it's not necessarily how we understand. Uh huh. That is actually one of the integral I would say for what we then how we then went about trying to speak us all up?
Well, I love that. I want you to say more about it. When you say it's not how we understand, what do you mean?
So? On one hand, right linearity of time forces us to almost always feel that we're communicating language in order and just linearly. It actually turns out that that's not really how we read, not even in terms of our secades, in terms of our em movements. We actually do jump back and forth quite a bit while reading, and if you look at conversations, you also have highly nonlinear elements where there's repetition, there's reference, there's basically different flavors of interruption.
But sure, by and large right, we would say we certainly right them left to right right. So if you write a proper text, you don't write it as you would read it, and you also don't write it as you would talk about it. You do write it in one linear order. Now, as we read this and as we understand this, we actually form groups of words that then form meaning. Right. So an example of that is you know adjective noun, right, it's or say, in this
case an article noun, it's not a frog, it's the frog. Right. We could have also said it's the green frog or the lazy frog.
Right. Language has a structure, right, and there things can modify other things, and things can modify the modifiers exactly exactly.
But the interesting thing now is that structure in as a as a tree structured clean hierarchy, only tells you half the story. There's so many exceptions where statistical dependencies, where modification actually happens at a distance.
So okay, So just to bring this back to your sample sentence, The frog didn't cross the road because it was too tired. That word it is actually quite far from the word frog. And if you're an AI going from left to right, you may well get confused there, right, You may think it refers to road instead of to frog. So this is one of the problems you were trying
to solve. And then the other one you were mentioning before, which is these models were just slow because after each word, the model just recalculates what everything means, and that just takes a long time.
They can't go fast enough exactly. It takes a long time, and it doesn't play to the strengths of the computers, of the accelerators that we're using there.
And when you say accelerators, I know Google has their own chips, but basically we mean GPUs.
Now right, we mean GPUs, We mean.
The chips that Nvidia sells. What is the nature of.
Those particular ships. Yeah, So the nature of those particular chips is that instead of doing a broad variety of complex computations in sequence, they are incredibly good. They excel
at performing many, many, many simple computations in parallel. And so what this hierarchical or semi hierrachical nature of language enables you to do is instead of having, so to speak, one place where you read the current word, you could now imagine you actually read every You look at everything at the same time, and you apply many simple operations at the same time to each position in your sentence.
Huh So this is the big idea, I just want to because this is it, right, this is the breakthrough happening. Yes, it's basically, what if instead of reading the sentence one word at a time from left to right, we read the whole thing all at once.
All at once. And now the problem is clearly something's got to give, right, so there's no fore lunch in that sense. You have to now simplify what you can do at every position when you do this all in parallel, but you can now afford to do this a bunch of times after another and revise it over time or
over these steps. And so instead of walking through the sentence from beginning to end, whether an average sentence has like twenty words or so average sentence in pros, instead of walking those twenty positions, what you're doing is you're looking at every word at the same time, but in a simpler way. But now you can do that maybe five or six times, revising your understanding, and that turns out is faster, way faster on GPUs and because of this hierarchical nature of language, it's also better.
So you have this idea, and as I read the little note on the paper, it was in fact your idea. I know you were working with a t but the paper credits you with the idea. So let's let's take this idea, this basic idea of look at the whole input sentence all at once, yep, a few times, and apply it to our frog sentence. Give me, give me that frog sentence again.
The frog did not cross the road because it was too tired. Good.
Tired is good because that's unambiguous.
Hot could be either one. It could be the road or the frog, right, Hot could be hot could be the one exactly is in fact hot could either could actually either one and non referential and non referential because it was too hot outside.
Outside it could be any of three things, the weather, or the frog or the road exactly. I love that tired solves the problem. So your model, this new way of doing things, how does it parse that sentence, what does it do?
So basically, let's look at the word it and look at it in every single step of these you know, say a handful of times repeated operation. Imagine you're looking at this word it, that's the one that you are now trying to understand better, and you now compare it to every other word in the sense. Okay, so you compare it to the to frog that did not cross the road because two and tired, there was two and
tire and initially in the first past. Already a very simple insight the model can fairly easily learn is that it could be strongly informed by frog, by road, by nothing, but not so by two or by the or maybe only to a certain extent by us. But if you want to know more about what it denotes, then it could be, you know, it could be informed by by all of these.
And just to be clear, that sort of understanding arises because it has trained in.
This way on lots of data.
It's encountering a new sentence after reading lots of other sentences with lots of pronouns with different possible antecedents.
Yeah, exactly, exactly. So Now the interesting thing is that which of the two it actually refers to, doesn't depend on only on what those other two words are. And this is why you need these subsequent steps because so let's talk with the first step. So what now happens is that, say the model identifies frog and road could
have a lot to do with the word it. So now you basically copy some information from both frog and road over to it, and you don't just copy it, you kind of transform it also on the way, but you refine your understanding of it. And this is all learned, does not given by rules or you know, in any way pre specifying.
Right, just by training on loge, just by training this emergency, and so that sort of the meaning of it after this first step is kind of influenced by both frog and road.
Yes, both frog and road. Okay, so now we repeat this operation again and we now know that it is unsure or the model basically now has this kind of superposition. Right, it could be road, it could be frog. But now in the next step it also looks at tired, and somehow the model has learned that when it means something inanimate,
that tired is not the thing. And so maybe in context of tired, it is more likely to refer to frog, and now you know, well, it is more likely and now maybe the model has figured out already, maybe needs a bit more, a few more iterations that it is most likely to refer to frog because of the presence of tired. So it has solved the problem. But it has solved the problem.
So you do, you have this idea, you try it out. There's a detail that you mentioned that's kind of fun, and we kind of skipped it, but you mentioned that another one of the co authors, who has also gone on to do very big things, was about to leave Google when you sort of want to test this idea, and and that fact that he was about to leave Google was actually important to the history of this idea.
Tell me about that it was important. So this Ilia Plususian, he was at the time that this started to gain any kind of speed, Elia was managing a good chunk
of my organization. And the moment he really made the decision to leave the company, he had to wait ultimately for his co for his co founder, and for them to then actually get going together in earnest and so he had a few months where he knew and I also knew that he was about to leave and where you know, the right thing would of course be to transition his team to another manager, which we did immediately, but where you then suddenly was in a position of
having nothing to lose and yet quite some time left to play with Google's resources and do cool stuff with interesting, interesting people. And and so that's one of those moments where suddenly your appetite for risk as a researcher just spikes, right, huh, because you have, for for a few more months, you have these resources at your disposal, you've transitioned your responsibilities.
At that stage, you're just like, Okay, let's try this crazy shit and and and it's and that's literally in so many ways, was was one of the integral catalysts because that also enabled, right, this kind of mindset of we're going for this now, whatever the reason. It still
you know affects other people. And so there were others who joined that collaboration really really early on, who I feel were much more excited a result, much more likely to really work on this and to really give it there all because of his you know, nothing left to lose, I'm going to go for this attitude at this.
Point, Right, was there a moment when you realized it worked.
There were actually a few moments. And it's interesting because on one hand, right, it's a very gradual thing, right, And initially, actually it took us many months to get to the point where we saw significant first signs of life of this not just being a curiosity but really being something that would end up being competitive. So there
certainly was a moment when that started. There was another moment when we for the for the first time had one machine translation challenge, one language pair of the W and T task as it's called, where our score, our model performed better than any other single model. The point in time when I think all of us realized this is special was when we not only had the best one in one of these tasks, but in multiple and
we didn't just have the best number. We also at that point were able to establish that we've gotten there with about ten times less energy or training compute spend.
Wow, So you do one tenth the work and you get a better result.
One tenth the work and you get a better result not just across one specific challenge, but across multiple including the hardest or of one of the harder ones. Right. And then at that stage we were still improving rapidly, and then you realize, okay, this is for real. There's because there right, It wasn't like we it wasn't that we had to squeeze those last little bits and pieces
of gain out of it. It was still improving fairly rapidly, to the point where actually, by the time we actually published the paper, we again reduced the computer requirements, not quite by an entire order of magnitude, but almost right, so it still was getting faster and better at a
pretty rapid rate. Wow, so we had in the paper we had some results that were those roughly ten x faster on eighthpus and what we demonstrated in terms of quality on those eight GPUs by the time we actually published the paper properly we were able to do with one.
GPU, one GPU meaning one chip of the kind that people by one hundred thousand of now to build a data center exactly. So the paper actually at the end mentions other possible uses beyond language for this technology. It mentions images, audio, and video, I think explicitly. How much were you thinking about that at the time. Was that just like an afterthought or were you like, hey, wait a minute, it's not just language.
By the time it was actually published at a conference, not just the preprint. By December, we had initial models on other modalities on generating images. We had the first the first at the stay. At that time they were not performing that well yet, but you know, they were rapidly getting better. We had the first prototypes actually of models working on genomic data, working on protein structure. That's
good for shadow good for shadowing exactly. But then we ended up for a variety of reasons, we ended up at first focusing on applications in computer vision.
The paper comes out, you know, you're working on these other applications, you're presenting the paper, it's published in various forms.
What's the response like. It was interesting because the response built in deep learning AI circles basically between the pre print that I think came out and I want to say June twenty seventeen, and then the actually actual publication, to the extent that by the time the poster session happened at the conference, there was quite a crowd at the poster so we had to be shoved out of the out of the hall in which the poster session happened.
About security and had very hors voices by the end of the evening, you guys were like the Beatles of the AI conference. I wouldn't say that because we weren't the Beatles, because it was really it was still very specific.
You were more that you were more of the cool hipster band. You were the hipster.
Band, certainly more the cool hipster band. But it was an interesting experience because there were some folks and including some greats in the field, who came by and said, Wow, this is this is cool.
What has happened since has been wild.
It seems wild to say the least. Yes, Is it surprising to you? Of course, many aspects are surprising. For sure. We definitely saw pretty early on already back in twenty eighteen, twenty nineteen, that something really exciting was happening here. Now I'm still surprised by with the advent of chat GPT, something that didn't go way beyond those language models that we had already seen a few years before, was suddenly the world's fastest growing consumer product.
Ever, right, I think ever?
Ever? Yes?
And by the way, GBT stands for generative pre transformer, right, transformer is your word, that's right? So there's an interesting I don't know, business side to this right, which is, you were working for Google when you came up with this. Google presumably owned the idea, had intellectual property around.
The idea has filed many a patent.
Was it just a choice Google made to let everybody use it? Like when you see the fastest growing consumer product in this year of the world not only built on this idea, but using the name like and it's a different company that was five years later.
Five years later.
But a patent's good for more than five years? Is that a choice?
Is that a stret dig choice? What's going on there? So the choice to do it in the first place, to publish it in the first place, is really based on and and rooted in a deep conviction of Google at the time, And I'm actually pretty sure it still is the case that it is. Actually these developments are the tide that floats all votes, that lifts.
All votes, like a belief in progress, a belief in progress, a good old fashioned Now.
It's also the case that at the time, organizationally, that specific research arm was unusually separated from the product organizations.
And the reason why Brain or in general, the deep learning groups were more separated was in part historical, namely that when they started out there were no applications and the technology was not ready for being applied, and so it's completely understandable and just you know a consequence of organic developments that when this technology suddenly is on the cusp of being incredibly impactful, you're probably still under utilizing it internally and potentially also not yet treating it in
the same way as you would have maybe otherwise treated previous trade secrets.
For example, as it feels like this out their research project, not like what's going to be this consumer.
Product exactly exactly, And to be fair, it took Open a Eye in this case a fair amount of time and to then turn this into this product, and most of that time it also from their vantage point, wasn't a product. Right. So up until all the way through chat REPT, Open Eye have published all of their GPT developments, maybe not all, but you know, their large fraction of their work on this.
Yeah, they're early models.
The whole models were open exactly. They were more true to their name really also believing in the same thing. And it was only really after chat GPT and after this to them also surprise to a certain extent success, that they started to become more closed as well when it comes to scientific developments in this past. You'll be back in just a minute. Let's talk about your company.
When'd you decide to start Inceptive? The decision took a while and was influenced by events that happened over the course of about three months two to three months in late twenty twenty, starting with the birth of my first child.
So when am I was born, two things happened. Number one, witnessing a pregnancy and a birth during a pandemic where there's a pathogen that's rapidly spreading, and so all of that was a pretty daunting experience, and everything went great, But having this new human in my arms also really made me question if I couldn't more directly affect people's
lives positively with my work. And so I was at the time quite confident that indirectly it would have effect also on things like medicine, biology, etc. But I was wondering, couldn't this happen more directly if I focused more on it. The next thing that happened was that alpha fold two results at CAST fourteen were published. CAST fourteen is this biannual challenge for protein structure prediction and some other related problems.
This is the protein folding problem, and this is the protein folding problem exactly.
The machine learning solving the protein folding problem, which had been a problem for decades given us chain of amino acids predict the three D structure of approach precisely, and humans failed and machine learning succeeded.
Just amazing. Yes, it's a great example. Humans failed despite the fact that we actually understand the physics fundamentally, but we still couldn't create models that were good enough using our conceptual understanding of the processes involve.
You would think an algorithm would work on that one, right, You would just think an old school set of rules, like we know what the molecules look like, we know the laws of physics. It's amazing that we couldn't predict it that way. Right. All you want to know is what shape is the protein going to be? You know all of the constituent parts, you know every atom in it, and you still couldn't predict it with a set of rules, but AI machine learning could.
Amazing, Yes, and it is amazing. Actually, when you put it like this, it's important to point out that and when we say we understand it, we make massive oversimplifying assumptions because we ignore all the other players that are
present when a protein folds. We ignore a lot of the kinetics of it because we say we know the structure, but the truth is, we don't know all the wiggling and all the shenanigans that happen on the way there, right, and we don't know about uh, you know, chaperone proteins that are there to influence the folding. We don't know around all sorts of other I'm doing the physics one.
I'm doing the assume a frictionless plane version of protein precisely.
Precisely, precisely. And the beauty is that deep learning doesn't need to make this assumption. AI doesn't need to make this assumption. AI it just looks at data, and it can look at more data than any human or even humanity eventually could look at together. It's such a good example problem to demonstrate that these models are ready for prime time in this field and ready for lots of applications, not just one or two, but men sold, and so
that happens, so sold exactly. And then the third thing was that the COVID mRNA vaccines came out with astonishing ninety plus percent out of.
The gate that they were still so underraty. Under the beginning of the pandemic, people were like, it'll be two or three years, and if there's sixty percent effective, that'll be.
Great, exactly exactly, And so everybody forgets. Everybody forgets it. And when you look at it, this is a molecule family that was for you know, most of the time that we've known about it since the sixties, I suppose we've treated it like a neglected stepchild of molecular biology, because you're talking about marine in general. In general.
Everybody loves DNA, right, DNA.
Everybody loves DNA movie star, Yeah, exactly, exactly, even though now looking back, DNA is merely you know, the place where life takes its notes, maybe the hard drive and the memory.
It's the book, right, it's the book. So but but at the end of the day, it was this molecule family that was about to save, you know, depending on them, tens of millions of lives and in rapid time. So all these things hold, but we have no training data to apply anything like alpha fold to this specific molecule family, no training data to speak of. We had two hundred thousand known protein structures at the time, I believe, maybe optimistically,
we had maybe twelve hundred known RNA structures. And on top of that, it was also fairly clear that for RNA going directly to function would be much much more important, because it's in a certain sense a less strongly structured molecule, and other aspects of the molecule might play a bigger role.
And then on top of that, the attention that generative AI was receiving overall, also now in the field of pharma or of medicine, was building, And so I ended up finding myself in a conversation where very I would say wise longtime mentor of mine pointed out that, you know, maybe ten years from now or so, somebody could tell my daughter that there was this perfect storm where this MACLE molecule with no training data was about to save the world and could do so much more in the
direction of positively impacting people's lives. We didn't have training data, would be very expensive to create it, but using the technology that I've been or technologies that I'd been working on for the last I don't know, ten plus years, and the ability because of the attention that people were now giving to AI in this field the ability to
raise quite a bit of money. I, in that position, chose to stay back at my cushy dream job in big tech and not actually take this opportunity to really positively impact people's lives, And that idea was not one I was willing to entertain.
You couldn't just coast it out at Google and let somebody else go figure out RNA.
Yeah, and it's not just RNA. I think RNA is a great starting point at the end of the day, but building models that learn from first of all, all the publicly available data that we can possibly get our hands on, but also from data that we can reasonably effectively create in our own lab. How to design molecules for specific functions is something that now is within reach and that will in the next years, in the years to come, have completely transformational impact on how we even
think about what medicines are. That any opportunity to speed this up, to make this happen, even just a day sooner than it could have otherwise happened, is incredibly valuable in my opinion.
As you're talking about this idea that the absence of training data is kind of seems to be at the center of it, right, It seems to be the core yeah problem, which makes sense, right, Like the reason language works so well is basically because of the Internet. I know, now we're going beyond it, but like it just happened to be that there was this incredibly giant set of natural life language that became available. We don't have anything like that for RNA, so are you. I mean, it's
kind of step one at inceptive creating the data. Is that kind of what's happening?
So step one that inceptive is learning to use all the data or was I think we've made a lot of focus in that direction, learning to use all the data that is available already and identify what other data we're missing, and then see how far we can get with just the publicly available data and at the same time scale up generating our own data. And it turns out that actually, because of the nature of evolution, because of how evolution isn't actually incentivized to really explore the
entire space of possibilities. It is almost always given that if you are trying to design exceptional molecules, especially ones that are not say, you know, natural formats, you are basically gearing need to need novel training in it.
Yeah, basically you're saying you build RNAs that don't exist in the world that have therapeutic uses, and there's no kind of definitionally no training.
Yes, that exist. The funny thing is we have a few of them, and so we have existence proofs of OURNA molecules, for example, RNA viruses that actually exhibit incredibly complex different functions in ourselves, that do all sorts of
things that we don't usually like. But if we could use those, you know, for good, If we could use those, you know, in ways that would actually be aimed at fighting disease rather than creating them, those kinds of functions, even just a small subset of them, would really transform medicine already. And so we know it's possible. What are you dreaming of when you say that, what are you
thinking of? Specific? Okay, So, for example, right, one estimate is that in order for COVID to infect you, you would need potentially as few as five COVID genomes inside your organism that's already in five five viral particles. Five viral particles. Yeah, you inhale those, you wouldn't have to inject it you wouldn't even have to swallow it, you inhale them.
If we could have a medicine that worked as well as a disease is a version of your.
Truth, exactly exactly so at the end of the day, right, this medicine is able to spread in your body only into certain types of organs and tissues and cells. It does certain things there that are really quite complex, right, changing the cells behavior again not usually in this case in favorable ways, but still in ways that wouldn't have to be modified that much in order to potentially be
exactly what you would need for complex multifactorial medicine. And if you could make all of that happen by just inhaling five of those molecules, then again, that would completely change how you think about medicine. Right, you have viruses that aren't immediately active, but that are inactive for long periods of time in your organism, and only under certain conditions,
say under certain immune conditions, really start being reactivated. Why can't we have medicines that work in a similar way where you actually not only in a vaccination sense, but where you take a medicine for a genetic predisposition for a certain disease that you are able to take a metic design of medicine that you can take and that waits until the disease actually starts to develop, and only then and only where that disease then starts developed, becomes
active and actually affects it and potentially also then alarms the doctor through a blunt test.
Like for cancer cells or something. So you have some kind of prophylactic medicine in your body and it is encoded in such a way that it just hangs out there, like herpes, to take a pathological example for example, and only in certain settings does it do anything. And those settings are if you see a cancer cell, destroy it, otherwise just it there precisely.
And if you can design those also in ways where you can just make them all go away. When you know, you take a say a completely harmless small molecule, and that's again entirely feasible.
Sure, So, I mean you're dreaming big. These are wonderful big you know, science fiction andy dreams that I hope you figure them out. On a practical level. What's happening at the company right now? How many people work there, what are they doing, and what are they figured out so far?
We're round forty. What we're doing is really exactly what we just talked about. We're basically scaling data generation experiments in our lab that allow us to assess a variety of different functions of different mostly RNA molecules actually mostly m RNA molecules at the moment, that are relevant to a pretty broad variety of different diseases. And so this ranges from things like infectious disease vaccines to sell therapies that can be applied in oncology or an auto or
against autoimmune disease. We have mRNAs that we hope will eventually be effective in enzyme replacement as enzyme replacement therapies for families of a large family of rare diseases, and
the list goes on. And so we're creating this or growing this training data set that eventually, on top of foundation and models that we pre trained on all publicly available data, allow us to tune those foundation models towards designing exceptional molecules for exactly those applications and many more sharing similar properties.
So you basically build new mr and a model molecules and test them, and then you give that data to your model and presumably it tells you what to build next, or it helps you figure out what to build next. It's sort of a loop in that way.
The models are definitely one interesting source for proposals if you wish for what to synthesize and test next, they're not the only such source, so we basically also explore kind of and maybe less guided or heuristically guided ways, but exactly so in some of the cases, it's really
quite iterative. For some of those functions and for some of those modalities and diseases or disease targets, we're actually already at a point where our models can spit out entirely novel molecules that really are unlike anything they've ever seen or we've ever seen in nature, that very consistently perform quite favorably compared to pretty strong baselines by incumbents in the field.
When you say perform quite favorably compared to baselines by incumbents in the field, and does that on some level mean better than what experts would think.
Up, better than what experts can think up, and also better than more traditional machine learning tools can easily produce.
It's like that famous moment in the Go match when alpha go made some move that like no human being ever would have thought of.
Yes, so I would say we've long passed the move thirty seven in the sense that our understanding of the underlying biological phenomena is so incomplete that for most of the things that we're able to design for, we don't really understand why they happen.
Huh, when you say weed, you mean at inceptive or do you mean just medicine in general?
I would say just medicine in general.
Okay, So Inceptive is doing this very kind of high level work, right, I mean building what will hopefully be the foundation. What's the right amount of time in the future to ask about when will we know if it works? You think five years?
So the general idea of using genitive AI and similar techniques to generate therapeutics, there are some things in clinical trials that were largely designed with AI. As far as I know, we're still maybe now we have the first trials just now starting for molecules that were truly entirely designed by A.
As opposed to sort of selected from a library.
Or selected, influenced, exactly selected, adjusted to you, and tweaked, et cetera. Right, So that's really still only happening just now, but we will see I believe, the first success or a first success of such molecules, certainly within the next five years.
What about more narrowly, the project at inceptive.
It's a similar timeframe. We should be able to get molecules into the clinic in the next few years, certainly in the next handful of years. Now. These will not be molecules with where the objective that we used in their design is you know, even remotely as complex or you know, kind of the different functions that we're designing for are are not going to be even remotely as diverse as say what you would find because we used this example earlier in ourna virus. These will really be
more simpler. Those will be molecules that don't do things that we couldn't possibly have done before, but that do them much better in ways that are more accessible, in ways that come with less side effects.
What biotech largely is is they make protein drugs. And so if you could make an mRNA drug where you put the m RNA into the body and the body makes the protein, it wouldn't be some crazy sleeper cell that sits in your body for twenty years or whatever, but it might be a more practical alternative to today's biotech drugs.
Absolutely.
So you've had a kind of crash course in biology in the last few years, yes, And I'm curious, like, what is what is something that has been particularly compelling or surprising or interesting to you that you have learned about biology.
They're countless things. The biggest one, or the red thread across many of them is really just how effective life is at finding solutions to problems that on one hand are incredibly robust, surprisingly robust, and on the other hand, are so different from how we would design solutions to similar problems.
Uh huh.
That really this comes back to this idea that we might just not be particularly well equipped in terms of cognitive capabilities to understand biology that basically, you know we are we would never think to do it this way, and how we think to do it is oftentimes much more brittle.
Uh huh. Brittle is an interesting world, less, less resilient, less able to persist under different.
Conditions, exactly exactly. I mean, you know, we still haven't built machines that can fix themselves, for one.
Which is fundamentally the miracle of being a human being.
Just fundamentally exactly, exactly exactly and so and of course this is true across the scales, right from from you know, single cells all the way to complex organisms like ourselves and and really just how many also very different kinds of solutions life has found and or and or constantly
is finding. Uh. And you see this all over the place, and it's both daunting, humbling, but also incredibly inspiring when it comes to applying AI in this area, because again I think that at least so far, it's the best tool and maybe actually the only tool we have so far in face of this kind of complexity. Really design interventions that medicines that go way beyond what we were able to do or are able to do, just based on our own conceptual understanding.
We'll be back in a minute with the lightning round. M hm, let's finish for the lightning round. As an inventor of the Transformer model, are there particular possible uses of it that worry you flash make you sad?
I am quite concerned about the p doom doomerism, whatever you want to call it, existential fear instilling rhetoric that is in some cases actually also promoted by people by entities in the space.
So just to be clear, you're you're not worried about the existential risk. You're worried about people talking.
I'm worried about the about the existential risk being inflated or the perception being inflated to the extent that we actually don't look enough at some of the much more concrete and much more immediate risks. Right. I'm not going to say that the existential risk is zero. That would be silly.
What is a concrete an immediate risk that is you think under.
Discuss these large scale models are such defective tools in manipulating people in large numbers already today, and it's happening everywhere for many, many different purposes by in some cases benevolent and in many cases malevolent actors that I really firmly believe we need to look much more at things like enabling cryptographic certification of human generated content, because doing that with the machine generated content is not going to work.
But we definitely can cryptographically certify human generated content as.
Such basically watermarking or something some way to say this a human made this.
Exactly what would you be.
Working on if you were not working in biology on drug development?
Education using using artificial intelligence to democratize access to education.
What have you seen that has been impressive or compelling to you in that regard?
There are lots of little examples so far and really countless. It's what's happening at the con Academy. There are many examples of AI applied to education problems in places like China, for example. You have a bunch of very compelling examples in fiction. A book I really like, like a named Neil Stephenson, The Diamond Age or a Young Ladies Illustrated primer that I recommend if you.
Just everybody in AI talks about that, Well now they do.
Yeah, it's yeah, well.
Now they do.
You liked it before?
It was cool?
I'm sure at one point I thought it was really really important and sure that Neil students know is that we are about to be able to build the primary and so I ended up having coffee with him to tell him, oh, that's great. So at the end of the day, maybe the biggest inspiration there is my daughter. She's four and a half now, and I think she
could today read. She can read read okay, but she could read, you know, grade school level if she had access to you know, an AI tutor teaching her how to read?
Does your daughter use AI use you know, AI chat butts not directly without me, But we've.
Actually used chat GPT to implement an AI reading tutor that works reasonably well. I mean we basically, you know, kind of as I call it now, vibe coding, vibe coded. And I wasn't there for all of it. Took some time, but she was there for some of it. Oh, you vibe coded it with her? Yeah, well, I mean she was, she was there. You know, she witnessed a good chunk of it, Yes, although she was more interested in the image generation parts. But yeah, we have a sketch of
one that she quite enjoys. So that's kind of like the extent of her at the sage using I directly. Yakabust is the CEO and.
Co founder of Inceptive and the co author of the paper Attention Is All You Need. Just a quick note, This is our last episode before a break of a couple of weeks, and then we'll be back with more episodes. Please email us at problem at Pushkin dot fm. We are always looking for new guests for the show. Today's show was produced by Trinamanino and Gabriel Hunter Chang. It was edited by Alexander Garretton and engineered by Sarah muguerrett
