When AI and Genomics Collide - podcast episode cover

When AI and Genomics Collide

Oct 03, 202324 minEp. 744
--:--
--:--
Listen in podcast apps:

Episode description

Today’s episode continues our coverage from a16z’s recent AI Revolution event. You’ll hear a16z Bio & Health GP Vijay Pande speak with Daphne Koller about the fascinating convergence of machine learning and genomics – two industries that have benefitted decades of investment and progress – which are now colliding head on.

Daphne is a prominent innovator at this intersection, as a long-time professor in computer science at Stanford and co-founder of Coursera, who has decided to step back into the arena with her company Insitro. In fact, Insitro is a blend of in silico and in virto!

If you’d like to access all the talks from AI Revolution in full, visit a16z.com/airevolution.

 

Resources:

 

Stay Updated: 

Find a16z on Twitter: https://twitter.com/a16z

Find a16z on LinkedIn: https://www.linkedin.com/company/a16z

Subscribe on your favorite podcast app: https://a16z.simplecast.com/

Follow our host: https://twitter.com/stephsmithio

Please note that the content here is for informational purposes only; should NOT be taken as legal, business, tax, or investment advice or be used to evaluate any investment or security; and is not directed at any investors or potential investors in any a16z fund. a16z and its affiliates may maintain investments in the companies discussed. For more details please see a16z.com/disclosures.

Transcript

We built a language model for biology. The big problem in biology is that biology is hard. Biology is really hard. A biology is very hard. So why are you doing it? What is the big win? I mean, all of it is an AI-enabled architecture. Every part of our technology set is intrinsically AI-enabled. Where do you think this confidence between AI and life sciences goes from here? There are so very few people who actually have the language of both disciplines that are able to bring them together.

If you are a listener to this podcast, you're probably familiar with Moore's Law. Where the number of transistors on an integrated circuit has doubled every two years. But what you might not know is the cost of sequencing the genome has fallen even faster. Since the Human Genome Project in 2003, the cost has fallen from about $1 billion to less than a thousand today. And now in 2023, these two trends meet and one of the people at this fascinating intersection is Daphne Collar.

Daphne, longtime professor in computer science at Stanford and co-founder of Coursera, has decided to step back into the arena with her company in C-Trow, right at this intersection of computation and biology. In fact, the name is even a nod to this intersection as a blend of Encilico and in Vitro.

And in today's episode, you'll get to hear A16Z Bio and Health General Partner, VJ Pandey, speak with Daphne about the why now, but also how machine learning is fundamentally changing our ability to understand genetics. Without machine learning, without AI, the space would be so complex and so high-dimensional that you couldn't even make sense of it far less bridge between those two different worlds. So what can this unlock? We can now finally, for the first time, measure biology at scale.

And this is like a fully engineered approach to discovery. Listen in to get a glimpse into what may be a new era of digital biology. And how this AI wave is really not just a moment of opportunity in bits, but also atoms. The next frontier of the impact that AI can have is when AI starts to touch the physical world. I think this convergence is a moment in time for us to make a really big difference using tools that exist today that did not exist even five years ago.

This episode also continues our coverage of A16Z's exclusive AI revolution event from just a few weeks ago, which housed some of the most influential builders across the ecosystem, including the founders of OpenAI and Thropic, Character AI, Roblox, and more. So be sure to check out the full package, including all of the talks in full at A16Z.com slash AI revolution. As a reminder, the content here is for informational purposes only.

Should not be taken as legal, business, tax, or investment advice, or be used to evaluate any investment or security, and is not directed at any investors or potential investors in any A16Z fund. Please note that A16Z and its affiliates may also maintain investments in the companies discussed in this podcast. For more details, including a link to our investments, please see A16Z.com slash disclosures. Daphne is like the OGZ OG in AI. She was a pioneer at Stanford in different areas of AI,

especially in PGMs. She left Stanford to co-found Corsera with Andrew. And actually is now the founder, CEO of InCetro, a tech biocompromise using AI to develop drugs in life sciences. So Daphne, given all the things you could be to why life sciences. It's one of the really hard and really important problems. And there is very few things that are as

challenging, as exciting as intervening in a safe and effective way in human health. And so it's just a thing that absolutely needs to be done if we are going to use AI for good, which I think is one of the things that I think I at least really strive to do. The second part of the answer is

why now? And what brought me back to this field back in 2016, post Corsera, was the realization that we can now finally, for the first time, measure biology at scale, both at the cellular level, sometimes at some cellular level, and at the organism level via ways of quantitating human biology. And that gives us for the very first time the ability to deploy machine learning in ways where it is truly meaningful to do that, because the data sets are large enough

for really interesting machine learning methods to be deployed. And the third part of the answer is, okay, but why me? And I am a big believer in leverage, that is places where you can have a disproportionately large impact. And because of the fact that I had spent a large part of my Stanford career, working in these two spaces simultaneously, core machine learning on the one hand, and machine learning and service about medical data on the other, I actually have the ability to

sort of bridge the chasm between these two very disparate disciplines. And when I was leaving Corsera in 2016, and I looked around me and I saw that machine learning was changing the world, it wasn't having much of an impact in the life sciences. And I believe one of the main reasons for that is that because there are so very few people who actually have the language of both disciplines, are able to bring them together. So I felt like I could have impact in AI across

many things, but sure I could have disproportionate impact. Well, you know, you spoke about the why now. What's your take on AI for life sciences? What's the why now there? And what's different now than even what we could do even just say five years ago? So I think it comes back to this ability to collect, but even more than collect and generate data at scale. So one of the things that we have

at NC TRO that is truly unique is we have a data factory. We have put together the tools that have been developed by people who are taking pluripotence themselves, which are cells from you or me or anyone in this audience and turning them into this pluripotence status, which can make a destiny neuron and a dish or a destiny. A parasite in this is going to be different than the VJ neuron and the VJ have a parasite because we have different genetics and that's going to

manifest in how these cells behave and how these cells look in different measurements. We can engineer those to introduce a disease causing mutation and ask what does that disease causing mutation do to a destiny neuron versus what does it do to a VJ neuron and what does this mutation do for a sustained mutation. Just a quick note for the audience, pluripotence stem cells are unique

cells that are able to undergo self renewal. The term pluripotency in particular speaks to these cells being able to give rise to all cell types including the ectoderm, like the skin or nervous system, the endoderm, like the liver or respiratory tract, and the mesoderm, like bone or muscle. All right, back to Daphne speaking to just how special this application is. So we're able to kind of do data generation on spec and that is a truly unique capability which frankly is not that easy

to do even in other areas where your eyes being deployed. You don't get to make your own data in many cases but here we do and that creates both really important discovery opportunities for life sciences but also really cool and interesting machine learning problems. Well maybe you could dive a little deeper and give an example so like your paper on the Posh

approach I think that came out on archive. Could you double click on that to help people what you did there especially like why is AI in life sciences a big deal what could you hope to get? So first of all let me tell you a little bit about that platform which is called Posh or pooled

optical screening in humans. You take a bunch of cells and you put them with a pool of crisper guides that edit them and each cell gets a different guide so now you have a bunch of cells each with a genetically diverse mutation and now they're all sitting there in a pool you can measure them

with a microscope you can measure them as they move around and do their thing you can basically fix them and you sequence the barcode that came with the guide so now you can say oh this cell that got this guide behaved this way and this other cell behaved that way and I can tell you that one

of the really challenging things about cells is because they're live if you put different cells in different wells then they each have a slightly different environment and you get subtle differences and it's really hard to reconcile when all the pool you eliminate all of those

artifacts and all of a sudden you have the ability to measure a genome-wide crisper screen basically so 20,000 genes in a genome all modifying the same cellular background in the same dish with a different genetic intervention and you're measuring that in a genome

wide scale in like 10 or 12 plates in two weeks now imagine doing that rinse repeat and doing genome-wide scale on this genetic background or in this cell type and so you can really start to decipher the genotype phenotype connection and the effect in which individual genetics

makes a difference on cellular phenotypes which we then translate to what we believe they will have in terms of clinical impact and that is the beginning of an understanding of what it is that we want to modify in order to have meaningful therapeutic interventions and this is like

a truly engineered approach to discovery another quick reminder for those of you like me where biology glass was a little further back than you'd like to admit genotype refers to the genetic makeup of an organism while phenotype refers to the observable characteristics like hair color or blood

type that are yielded from a genotype but also in organisms environment this is really important because phenotype or the observable impact isn't only due to genetics as sometimes even the same genes can produce different results one such example of this is in honey bees in which colonies

will have the same genome but very significantly in phenotype like size shape or behavior of queen bees and as Daphne said new technologies are bringing us closer to understanding this very important connection well the biology part is really critical because now you get the data and we all know

how important that is but I think one of the things that's really found intriguing is the creation of a latent space for human biology and especially being able to tell the difference between disease and non-disease or even different disease phenotype so how does that come about and especially

you know how does AI drive the net so actually i'm going to go back one step further because you said of course one of the things we need to do is get the data and I should have mentioned that it's impossible to run this instrument without AI being built into it because you can't even segment

the cells you can't call the barcodes I mean all of it is an AI enabled architecture every part of our technology stat is intrinsically AI enabled but then to your point VJ I have a whole bunch of cellular images and what do you do with them and so the first thing we do is we build the latent space we build a language model for biology now everyone's an expert in language models you have to explain this to people like oh language of biology no one knew what I was talking about but now

it's like I'm just saying look it's just like GPT but for cells so we have the language of you know cells and what cells look like or the transcriptional or gene expression profiles of cells and you measure hundreds of millions of cells in different states and now with a much more limited amount of

data because we have this latent space then just like the large language models for natural language the small amount of data you can start asking okay how does disease move you like a disease causing gene for one place to the other how does a treatment move you hopefully back in the opposite

direction from the disease state back to the healthy state and that's super powerful kind of like other language models you have this it keeps getting better the more data you feed it and let me just say this is not just for cellular data the other source of data that we use is clinical data

so we do the same thing with histopathology there's so much more in histopathologies and your pathologist typically looks at in MRI data your radiologist doesn't see more than like a small percentage of what's there in your radiology images but also not just imaging there's also

other modalities where there's an equal amount of information left on the table and over time we're learning these languages of different biological modalities and the ability to translate between them Tech Week is back and we're coming to New York City we had over 750 events in San Francisco and

allay this year and starting on October 16th there are already over 300 events on the calendar for New York Tech Week so to celebrate we are giving away three tickets to A16Z's welcome party the kicks the whole week off and there are several ways to enter including you can retweet the

giveaway announcement post you can also tweet your own attendance using hashtags NY Tech Week or you can let us know on YouTube by using the phrase see you at New York Tech Week all the details and more can be found at a16z.com slash Tech Week NYC I think this concept of

foundation model for biology is particularly exciting because you know 10 years ago you could have ML that was predictive you just needed maybe a hundred active yeah and the problems like if you have a hundred cases examples of a drug that works you don't need to design a drug and so these

low shot zero shot approaches that come from a foundation model are really night and day so how for this is go I mean the big problem in biology is that biology is hard biology is really hard of all she's very hard so why are you doing it what is the big win where it is is go let's say

by then the decade like what could you hope to do that we couldn't do before we want to come up with a very almost like systematic recipe for how do you go from a decision that I want to work on ALS or want to work on fatty liver disease through a sequence of steps towards something that results in

a meaningful intervention in the right patient population the hope is that by the end of this decade we will have built this process we will have run through it a number of times we will have delivered some medicines the patients at our first tranche of indications but then we will have

learned enough from that so that we can now say okay and here's how we're going to do it here and here and here and because it's not only machine learning moves forward over time it's also the biological tools that we're relying on I mean used to be that there wasn't any CRISPR there

was just S.I.R.A. actually there wasn't even that and then there's like CRISPR base editing and now there's CRISPR prime that replaces entire regions of the genome so the tools that we're building on also get better and better over time which unlocks more and more diseases that we could tackle

in meaningful well let's step back for a second because I think this may not be clear for everyone like why biology is so hard and one of the biggest reasons is that if we can do tons of experiments on mice you know I have to feel like it's a great time to be a rich mouse you could be cured of

any disease right like all these diseases can be cured in mice but you know it's obviously unethical to experiments on people and that's one of the big reasons why trials fail right because when you go into a clinical trial you spent all this money to get there you're spending all this

money hundreds of millions of dollars in the trial and turns out mice are different than people and it fails yeah so like how can AI help that so first of all and in this notion of you know we can cure lots of mice is something that really drove our discovery strategy at in ctro which is

all of our work is done in human and human derived systems and incorporates at least some subset of human cells working together so that's one piece and the nice thing about it is that it allows you to intervene in those systems and ask the what if questions the counterfact is like what if

I had this person's biology but in a world where this gene was inactive versus active or the other way around so that's great but obviously you want to cure people not cells or even organoids and so the other source of data that we bring in is data from people from clinical records

and what we end up doing is kind of bridging between the two using machine learning so using machine learning on the cellular data using machine learning on the human data bridging between those in representation space but also in genetic space because genetics is kind of like this thread

that ties the two together and without machine learning without AI the space would be so complex and so high-dimensional that you couldn't even make sense of it far less bridge between those two different worlds yeah that makes sense some curious to change years a bit and talk a bit about

company building yeah one of the interesting things that you've done is that you've brought together people that are biology experts with people that are MLA experts and how do you build that culture what does that look like especially since they're from fairly different parts of the universe

so first of all it may not have been obvious to everybody but the company name in C-Tro is actually the blend of Encilico and Envitro and Sylico being the computer and Envitro being in the lab those elements of bringing those two strands together are so deeply woven even into our logo

and how do you build that is really hard because you take your average you know machine learning scientists and your average life scientist even if they're very well intentioned you put them into the room together they might as well be talking times what he leads with each other the languages

are different the ways in which you think are totally different so how do you create a shared language a shared vision and so there's a few approaches that we use for some we hire some number of people you can't get enough of them unfortunately we're in the middle we're able to

be translators and talk to both sides and kind of bring them together and then I think the other really important part is that you create a culture and you hire very rigorously to that culture of people who are genuinely interested in engaging with you know the other side and we have a list

of company values and the final value which is one that I hold particularly dear it's last not because it's least importance because they're ordered from what we do to how we do it which is that we engage with each other openly constructively and with respect openly means an openness that

asking really naive questions when you don't understand and to accepting really naive suggestions from somebody else because it was the best ideas come from an orthogonal mindset it's something that I experience even as a kid so my parents are scientists and my mom warned me no matter what I do

even though this is what I was doing as a kid because programming as a kid I should not get into programming a computer science no one's ever going to make money by selling software so I think maybe then you know especially for this audience here that are coming from the AI side

especially as AI gets into areas that are sort of not just the world of bits but in the world of atoms yeah any advice for how to bridge those gaps I mean first of all having a deep respect for atoms is really important yeah I mean someone but my closest friends are atoms yes I think we're all

atoms but I think having an appreciation for the complexity of atoms and the fact that especially especially when your atoms are part of live systems they behave in unexpected unpredictable idiosyncratic ways that sometimes cause a lot of pain and I can tell you that when you do biological

experiments one of the strongest signals when you apply machine learning to it is what was the condition who actually did the experiments you could read that very clearly off the cells because they behave a little bit differently they pipette a little bit differently they treat

the cells a little bit differently it's amazing how hard that is to clean that up which is one of the reasons why we spend so much of our time building robots because they do the same thing over and over again so I think having a lot of respect for atoms but also I would say an appreciation

for the fact that the next frontier of the impact that I can have is when AI starts to touch the physical world and we've all seen just how much harder that is we've all seen how hard it is astonishingly to build a self-driving car compared to building a child bought right so having an

appreciation for that complexity but also an appreciation for the magnitude of the impact if you can actually nail it so one last topping then we'll go into closing but you're talking about life sciences in terms of healthcare and drug design but there's a lot more to biology than the

struggles right where do you think this confidence between AI life sciences goes from here so I actually think that there is this incredible opportunity at this point at this intersection between two fields and I think about it from a little bit of a historical perspective of think back

on the history of science and at certain times in our history there have been eras where a particular scientific discipline has made incredible amounts of progress in a relatively short amount of time because there was kind of like a click where we started to see the world

in a different way or there was a tool that wasn't available before so if you think back to the late 1800s that was chemistry where we suddenly realized we couldn't really turn lead into gold and there was a thing called the periodic table and there were electrons and it

really shifted chemistry and then in the early 1900s obviously that discipline was physics and the connection between energy and matter and between space and pine completely shifted our understanding of the universe in 1950s that discipline was computing where we get these machines

that perform calculations that up until that point only a human was able to perform and then 1990s there was this interesting bifurcation on the one side there was data science that drew on computers but also had elements of neuroscience and optimization and statistics and

ultimately gave us modern day machine learning and AI and then the other side was what I think of is quantitative biology which was the first time when we actually started to measure biology in a scale that was more than like track three genes across an experiment that took five years

and that was the first microarray data and the first human genome and so on and so forth and I think this is the time when those last two disciplines are actually going to merge and they're giving us an era of what I think of as digital biology which is the ability to

measure biology at unprecedented stability in scale interprets the unbelievable masses of data different biological scales and different systems using the tools of machine learning data science and then bring that back to engineer biology using tools like CRISPR genome editing and so on so that we can make biology do things that it would otherwise not want to do.

We like what? So I think there's obviously as we said applications and human health but I think there's applications in agriculture I don't think we need to tell anybody anymore although there's still some people who might need to hear about the impact of climate change on our world and the fact that we need to have props that are much more resistant to drought

and a severe weather. And to feed 10 billion people. To feed 10 billion people. I think there is opportunities in the environment to maybe do better carbon sequestration using plans or algae or who knows what I actually wish I knew more about that because that is my alternate life would have been to do that. Well, there's still time right now. Well, there's still time. So are you funding me for that? You come up with the deck will talk. So there's that. I think there is

you know, bio materials and so on. There's so many opportunities at this intersection that I would encourage any of you in this audience who are looking for something truly aspirational and exciting to do. I think this convergence is a moment in time for us to make a really big difference in the world that we live in using tools that exist today that did not exist even five years ago. Yeah, with that, I think that's the opportunity to hand. We will wrap

up there. Let's thank definitely one more time. If you liked this episode, if you made it this far, help us grow the show. Share with a friend or if you're feeling really ambitious, you can leave us a review at ratethispodcast.com slash B16T. You know, candidly producing a podcast can sometimes feel like you're just talking into a void. And so if you did like this episode, if you liked any of our episodes, please let us know. We'll see you next time.

This transcript was generated by Metacast using AI and may contain inaccuracies. Learn more about transcripts.