NeurIPS 2024: AI for Science with Chris Bishop - podcast episode cover

NeurIPS 2024: AI for Science with Chris Bishop

Dec 13, 202422 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

In this special edition of the podcast, Technical Fellow and Microsoft Research AI for Science Director Chris Bishop joins guest host Eliza Strickland in the Microsoft Booth at the 38th annual Conference on Neural Information Processing Systems (NeurIPS) in Vancouver, British Columbia, to talk about deep learning’s potential to improve the speed and scale at which scientific advancements can be made.

Transcript

to the Microsoft Research Podcast, where  Microsoft's leading researchers bring you to   the cutting edge. This series of conversations  showcases the technical advances being pursued   at Microsoft through the insights and  experiences of the people driving them.  I'm Eliza Strickland, a senior editor at IEEE  Spectrum and your guest host for a special   edition of the podcast. [MUSIC FADES]  Joining me today in the Microsoft Booth  at the 38th annual Conference on Neural  

Information Processing Systems, or NeurIPS, is  Chris Bishop. Chris is a Microsoft technical   fellow and the director of Microsoft  Research AI for Science. Chris is with   me for one of our two on-site conversations  that we’re having here at the conference.  Chris, welcome to the podcast. CHRIS BISHOP: Thanks,  

Eliza. Really great to join you. STRICKLAND: How did your long   career in machine learning lead you to this  focus on AI for Science, and were there any   pivotal moments when you started to think that,  hey, this deep learning thing, it's going to   change the way scientific discovery happens? BISHOP: Oh, that's such a great question.   I think this is like my career coming full circle,  really. I started out studying physics at Oxford,  

and then I did a PhD in quantum field theory. And  then I moved into the fusion program. I wanted to   do something of practical value, [LAUGHTER] so I  worked on nuclear fusion for about seven or eight   years doing theoretical physics, and then that  was about the time that Geoff Hinton published   his backprop paper. And it really caught  my imagination as an exciting approach to   artificial intelligence that might actually yield  some progress. So that was, kind of, 35 years ago,  

and I moved into the field of machine learning.  And, actually, the way I made that transition   was by applying neural networks to fusion. I  was working at the JET experiment, which was   the world's largest fusion experiment. It was  sort of big data in its day. And so I had to,   first of all, teach myself to program. STRICKLAND: [LAUGHS] Right. 

BISHOP

I was a pencil-and-paper theoretician  up to that point. Persuade my boss to buy me   a workstation and then started to play with  these neural nets. So right from the get-go,   I was applying machine learning 35 years ago  to data from science experiments. And that was   a great on-ramp for me. And then, eventually,  I just got so distracted, I decided I wanted  

to build my career in machine learning. Spent a  few years as a research professor and then joined   Microsoft 27 years ago, when Microsoft opened its  first research lab outside the US in Cambridge,   UK, and have been there very happily ever  since. Went on to become lab director. But   about three or four years ago, I realized  that not only was deep learning transforming   so many different things, but I felt it was  especially relevant to scientific discovery.  

And so I had an opportunity to pitch to our chief  technology officer to go start a new team. And he   was very excited by this. So just over two and a  half years ago now, we set up Microsoft Research   AI for Science, and it's a global team, and  it, sort of, does what it says on the tin. 

STRICKLAND

So you’ve said that AI could usher  in a fifth paradigm of scientific discovery,   which builds upon the ideas of Turing  Award–winner Jim Gray, who described   four stages in the evolution of science. Can you  briefly explain the four prior paradigms and then   tell us about what makes this stage different? BISHOP: Yeah, sure. So it was a nice insight   by Jim. He said, well, of course, the first  paradigm of scientific discovery was really  

the empirical one. I tend to think of some  cave dweller picking up a big rock and a small   rock and letting go of them at the same time and  thinking the big rock will hit the ground first …  [LAUGHS] Right … BISHOP: … discovering they land together.   And this is interesting. They've discovered  a, sort of, pattern irregularity in nature,  

and even today, the first paradigm is in  a sense the prime paradigm. It’s the most   important one because at the end of the day, it's  experimental results that determine the truth,   if you like. So that's the first paradigm. And  it continues to be of critical importance today.  

And then the second paradigm really emerged  in the 17th century. When Newton discovered   the laws of motion and the law of gravity, and  not only did he discover the equations but this,   sort of, remarkable fact that nature  can even be described by equations,   right. It's not obvious that this would be true,  but it turns out that, you know, the world around  

us can be described by very simple equations  that you can write on a T-shirt. And so in the   19th century, James Clerk Maxwell discovered  some simple equations that describe the whole   of electricity and magnetism, electromagnetic  waves, and so on. And then very importantly,   the beginning of the 20th century, we had this  remarkable breakthrough in quantum physics. So   again down at the molecular—the atomic—level,  the world is described with exquisite precision  

by Schrödinger's equation. And so this was the  second paradigm, the theoretical. That the world   is described with incredible precision of a huge  range of length and time by very simple equations.  But of course, there's a catch, which is those  equations are very hard to solve. And so the   third paradigm really began, I guess, sort of,  in the ’50s and ’60s, the development of digital   computers. And, actually, the very first use  of digital computers was to simulate physics,  

and it's been at the core of digital computing  right up to the present day. And so what you're   doing there is using a computer to go with a  numerical algorithm to solve those very simple   equations but solve them in a practical setting.  And so that's, I’ll refer to that as simulation.   That's the third paradigm. And that's proven  to be tremendously powerful. If you look up the   weather forecast on your phone today, it's done  by numerical weather forecasting, solving in those  

case Navier-Stokes equations using big numerical  simulators. What Jim Gray observed, though,   really emerging at the beginning of the 21st  century was what he called the fourth paradigm,   or data-intensive scientific discovery. So this  is the era of big data. Think of particle physics   at the CERN accelerator, for example, generating  colossal amounts of data in real time. And that   data can then be processed and filtered. We can do  statistics on it. But of course, we can do machine  

learning on that data. And so machine learning  feeds off large data. And so the fourth paradigm   really is dominated today by machine learning.  And again that remains tremendously important.  What I noticed, though, is that there's  again another framework. We call it the fifth   paradigm. Again, it goes back to those fundamental  equations. But again, it's driven by computation,   and it's the idea that we can train machine  learning systems not using the empirical data  

of the fourth paradigm but instead using the  results of simulation. So the output of the   third paradigm. So think of it this way. You want  to predict the property of some molecule, let's   say. You could in principle solve Schrödinger’s  equation on a digital computer; it’d be very   expensive. And let's say you want to screen  hundreds of millions of molecules. That's going  

to get far too costly. So instead, what you can  do is have a mindset shift. You can think of that   simulator not as a tool to predict the molecule’s  properties directly but instead as a way of   generating synthetic training data. And then you  use that training data to train a deep learning   system to give what I like to call an emulator,  an emulator of the simulator. Once it's trained,  

that emulator is fast. It's usually three to four  orders of magnitude faster than the simulator. So   if you're going to do something over and over  again, that three-to-four-order-of-magnitude   acceleration is tremendously disruptive. And  what's really interesting is we see that fifth  

paradigm occur in many, many different places.  The idea goes back a long way. The, actually,   the last project that I worked on before I left  the fusion program was to do what was the world's   first-ever real-time control of a tokamak fusion  plasma using a neural net and the computers of the   day. But the processors were just far too slow,  long before GPUs, and so on. And so it wasn't  

possible to solve the equations. In that case,  it was called the Grad-Shafranov equation. Again,   a simple differential equation you could write  on a T-shirt, but solving it was expensive on   a computer. We were about a million times too slow  to solve it directly in real time. And so instead,   we generated lots and lots of solutions. We  used those solutions to train a very simple   neural network, not a deep network, just a  simple two-layer network back in the day,  

and then we implemented that in special hardware  and did real-time feedback control. So that was an   example of the fifth paradigm from, you know,  a quarter of a century ago. But of course,   deep learning just tremendously expands  the range of applicability. So today we're   using the fifth paradigm in many, many different  scenarios. And time and time again, we see these   four-orders-of-magnitude acceleration. So I think  it's worthy of thinking of that as a new paradigm  

because it's so pervasive and so ubiquitous. STRICKLAND: So how do you identify fields of   science and particular problems that are  amenable to this kind of AI assistance?   Is it all about availability of data  or the need for that kind of speed up? 

BISHOP

So there are lots of factors that  go into this. And when I think about AI for   Science actually, the space of opportunity  is colossal because science is, science is   really just understanding more about the world  around us. And so the range of possibilities is   daunting really. So in choosing what to work on,  I think there are several factors. Yes, of course,   data is important, but very interestingly, we  can use experimental data or we can generate  

synthetic data by running simulators. So we're a  big fan of the fifth paradigm. But I think another   factor—and this is particularly at Microsoft—is  thinking about, how can we have real-world impact   at scale? Because that's our job, is to make the  world a better place and to do so at a planetary  

scale. And so we've settled on, for the most part,  working at the molecular level. So if you think   about the number of different ways of combining  atoms together to make new stable configurations   of atoms, it’s gargantuan. I mean, the number of  just small molecules, small organic molecules,   that are potential drug candidates is about 10  to the power of 60. It's about the same as the   number of atoms in the solar system. The number  of proteins, maybe the fourth power of the number  

of atoms in the universe, or something crazy. So  you've got this gargantuan space to search, and   within that space, for sure, there'll be all sorts  of interesting molecules, materials, new drugs,   new therapies, new materials for carbon capture,  new kinds of batteries, new photovoltaics. The   list is endless because everything around us  is made of atoms, including our own bodies.   So the potential just in the molecular space is  gargantuan. And so that's why we focus there. 

STRICKLAND

It's a big focus. [LAUGHTER] BISHOP: It's a broad focus, still, yes.  So let's take one of these case  studies then. In a project on drug discovery,   you worked with the Global Health Drug Discovery  Institute on molecules that would interact with   tuberculosis and coronaviruses, I think. And you  found, I think, candidate molecules in five months   instead of several years. Can you talk about  what models you used in this work and how they  

helped you get this vastly sped up process? BISHOP: Sure. Yes. We're very proud of this   project. We're working with the Gates Foundation  and the Global Health Drug Discovery Institute to   look at particularly diseases that affect  low-income countries like tuberculosis.   And in terms of the models we use, I think we're  all familiar with a large language model. We train   it on a sequence of words or sequence of word  tokens, and it's trained to predict the next  

token. We can do a similar thing, but instead of  learning the language of humans, we can learn the   language of nature. So in particular, what we're  looking for here is a small organic molecule   that we could synthesize in a laboratory that will  bind with a particular target protein. It's called   ClpP. And by interfering with that protein, we can  arrest the process of tuberculosis. So the goal is  

to search that space of 10 to the 60 molecules and  find a new one that has the right properties. Now,   the way we do this is to train something that's  essentially a transformer. So it looks like a   language model, but the language it's trained  on is a thing called SMILES strings. It's an   idea that's been around in chemistry for  a long time. It's just a way of taking a   three-dimensional molecule and representing it as  a one-dimensional sequence of characters. So this  

is perfect for feeding into a language model. So  we take a transformer and we train it on a large   database of small organic molecules that are, sort  of, typical of the kinds of things you might see   in the space of drug molecules. Once that's been  trained, we can now run it generatively. And it   will output new molecules. Now, we don't just  want to generate molecules at random because   that doesn't help. We want to generate molecules  that bind to this particular binding site on this  

particular protein. So the next step is we  have to tell the model about the protein and   the protein binding site. And we do that by giving  it information about not actually—well, we do tell   it about the whole protein, but we especially  give it information about the three-dimensional   geometry of the binding site. So we tell about  the locations of the atoms that are in the binding   site. And we do this in a way that satisfies  certain physics constraints, sort of, equivariance  

properties, it's called. So if you think about a  molecule, if I rotate the molecule in space, the   positions of all the atoms change in a complicated  way. But it's the same molecule; it has the same   energy and other properties and so on. So we need  the right kind of representation. That's then fed   into this transformer using a technique called  cross-attention. So internally, the transformer   uses self-attention to look at the history  of tokens, but it can now use cross-attention  

to look at another model that understands the  proteins. But even that's not enough. Because   in discovering drugs and exploring this gargantuan  space and looking for these needles in a haystack,   what typically happens [is] you find a hit,  a molecule that binds, but now you want to   optimize it. You want to make lots of small  variations of that molecule in order to make   it better and better at binding. So the third  piece of the architecture is another module, a  

thing called a variational autoencoder, that again  uses deep learning. But this time, it can take as   input an organic molecule that is already known,  a hit that's already known to bind to the site,   and that again is fed in through cross-attention.  And now the SMILES autoregressive model can now   generate a molecule that's an improvement  on the starting molecule and knows about the  

protein binding. And so what we do is, we start  off with the state-of-the-art molecule. And the   best example we found is one that's more than  two orders of magnitude stronger binding affinity   to the binding pocket, which is a tremendous  advance; it’s the state of the art in addressing   tuberculosis. And of course, the exciting thing  is that this is tested in the laboratory. So this   is not just a computer experiment in some sort  of benchmark or whatever. We sent a description  

of the molecule to the laboratories at GHDDI.  They synthesized a molecule, characterized it,   measured its binding property, and said, well,  hey, this is a new state of the art for this   target protein. So we're continuing to work with  them to further refine this. There are obviously   quite a few more steps. If you know about the  drug discovery process, there’s a lot of hurdles   you have to get through, including, of course,  very important clinical trials, before you have  

something that can actually be used in humans.  But we're already hugely excited about the fact   that we were able to make such a big advance  so quickly, in such a short amount of time,   compared to the usual drug discovery process. STRICKLAND: And while you were looking for that   molecule that had the proper characteristics,  were you also determining whether it could be   manufactured easily, like trying to think about  practical realities of bringing this thing out  

of the computer and into the lab? BISHOP: Great question. I mean,   you're hinting there at the fact the discovery  process, of course, is a long pipeline. You start   with the protein. You have to find a molecule that  binds. You then refine the molecule. Now you have   to look at ADMET, you know, the absorption,  metabolism, and excretion and so on of the   molecule. Also make sure that it's not toxic.  But then you need to be able to synthesize it.  

It's no good if nobody can make this molecule.  So you have to look at that. So, actually,   in the AI for Science team, we look at all of  these aspects of that drug discovery process.   And we find particular areas, especially where  there’s, sort of, low-hanging fruit where we can   see that deep learning can make a big impact. It  doesn't necessarily help much to take a very easy,   fast piece of the pipeline and go work on that.  You want to understand, what are the bottlenecks,  

and can we really unlock those with deep  learning? So we're very interested in that   whole process. It’s a fascinating problem.  You've got a gargantuan search space,   and yet you have so many different constraints  that need to be met. And deep learning just feels   like the perfect tool to go after this problem. STRICKLAND: When you talk to the scientists   that you collaborate with, is AI  changing the kinds of questions  

that they are able to ask? That they want to ask? BISHOP: Oh, for sure. And it's really empowering.   It's enabling those working in the drug discovery  space to, I think, to think in a much more  

expansive way. If you think about just the kind  of acceleration that I talked about from the fifth   paradigm, if you go to four-order-of-magnitude  acceleration, OK, it may not sound like much of   a dent onto the 10 to the power 60 space, but now  when you're exploring variants of molecules and so   on, the ability to explore that space orders  of magnitude faster allows you to think much   more creatively, allows you to think in a more  expansive way about how much of that space you can  

explore and how efficiently you can explore it.  So I think it really is opening up new horizons,   and certainly, we have an exciting partnership  with Novartis. We've been working with them for   the last five years, and they've been deploying  some of our techniques and models in practice   for their drug discovery pipeline. We get a lot  of great feedback from them about how exciting   they're finding these techniques to use in  practice because it is changing the way they  

go about doing the drug discovery process. STRICKLAND: To jump to one other case study,   we don't have to go into great detail on it,  but I'm very curious about your Project Aurora,   this foundation model for state-of-the-art  weather forecasting that, I believe, is 5,000   times faster than traditional physics-based  methods. Can you talk a little bit about how   that project is evolving, how you imagine these  AI forecasting models working with traditional  

forecasting models, perhaps, or replacing them? BISHOP: Yes. So I said most of what we do is   down at the molecular level. So this is one of the  exceptions. So this is really at the global level,   the planetary level. Again, it's a beautiful  example of the fifth paradigm because the way   forecasting has been done for a number of  decades now and the way most forecasting is   done at the moment is through what's called  numerical weather prediction. So again,  

you have these simple equations. It's no longer  Schrödinger’s equation of atomic physics. It's   now Navier–Stokes equations of fluid flows and  a whole bunch of other equations that describe   moisture in the atmosphere and the weather  and so on. And those equations are solved   on a supercomputer. And again, we can think  of that numerical simulator now not just as   the way you're going to do the forecasting but  actually as the way to generate training data  

for a deep learning emulator. So several  groups have been exploring this over the   last couple of years. And again, we see this  very robust three-to-four-order-of-magnitude  

acceleration. But what's really interesting about  Aurora, it's the world's first foundation model,   so instead of just building an emulator of  a particular numerical weather simulator,   which is already very interesting, we trained  Aurora on a much more diverse set of data and   really trying to force it not just to emulate  a particular simulator but really, as it were,  

understand or model the fundamental equations of  fluid flows in the Earth's atmosphere. And then   the reason we want to do this is because we now  want to take that foundation model and fine-tune   it to other downstream applications where there’s  much less data. So one example would be pollution   flow. So obviously the flow of pollution around  the atmosphere is extremely important. But the   data is far more sparse. There are far fewer  sensors for pollution than there are for, sort of,  

wind and rain and temperature and so on. And so we  were able to achieve state-of-the-art performance   in modeling the flow of pollution by leveraging  huge data and building this foundation model and   then using relatively little data, our pollution  monitoring, to build that downstream fine-tuned   model. So beautiful example of a foundation model. STRICKLAND: That is a cool example. And finally,   just to wrap up, what have you seen or heard  at NeurIPS that’s gotten you excited? What  

kind of trends are in the air? What’s the buzz? BISHOP: Oh, that’s a great question. I mean,   it's such a huge conference. There's something  like 17,000 people or so here this year,   I've heard. I think, you know, one of the things  that's happened so far that's actually given me an   enormous amount of energy wasn’t just a technical  talk. It was actually an event we had on the first  

day called Women in Machine Learning. And I  was a mentor on one of the mentorship tables,   and I found it very energizing just to meet  so many people, early-career-stage people,   who were very excited about AI for Science and  realizing that, you know, it's not just that I   think AI for Science is important. A lot of  people are moving into this field now. It is   a big frontier for AI. I'm a little biased,  perhaps. I think that it's the most important  

application area. Intellectually, it's very  exciting because we get to deal with science   as well as machine learning. But also if you think  about [it], science is really about learning more   about the world. And once we learn more about  the world, we can then develop aquaculture;   we can develop the steam engine; we can develop  silicon chips; we can change the world. We can  

save lives and make the world a better place. And  so I think it's the most fundamental undertaking   we have in AI for Science and the thing I  loved about the Women in Machine Learning   event is that the AI for Science table was  just completely swamped with all of these   people at early stages of their career, either  already working in this field and doing PhDs   or wanting to get into it. That was very exciting. STRICKLAND: That is really exciting and inspiring,  

and it gives me a lot of hope. Well, Chris  Bishop, thank you so much for joining us   today and thanks for a great conversation. BISHOP: Thank you. I really appreciate it.  [MUSIC] STRICKLAND: And to our listeners,   thanks for tuning in. If you want to  learn more about research at Microsoft,   you can check out the Microsoft Research  website at microsoft.com/research. Until   next time. [MUSIC FADES]

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android