Abstracts: Zero-shot models in single-cell biology with Alex Lu - podcast episode cover

Abstracts: Zero-shot models in single-cell biology with Alex Lu

May 22, 202514 min
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

The emergence of foundation models has sparked interest in applications to single-cell biology, but when tested in zero-shot settings, they underperform compared to simpler methods. Alex Lu shares insights on why more research on AI models is needed in biological applications.

Show notes

Transcript

spotlight on world-class research in brief.  I’m Gretchen Huizinga. In this series,   members of the research community  at Microsoft give us a quick   snapshot – or a podcast abstract –  of their new and noteworthy papers. On today's episode, I'm talking to Alex Lu,  a senior researcher at Microsoft Research   and co-author of a paper called Assessing  the Limits of Zero Shot Foundation Models   in Single-cell Biology. Alex Lu, wonderful to  have you on the podcast. Welcome to Abstracts!

Alex Lu

Yeah, I'm really  excited to be joining you today.

Huizinga

So let's start with a little  background of your work. In just a few   sentences, tell us about your study  and more importantly, why it matters.

Lu

Absolutely. And before I dive in, I want  to give a shout out to the MSR research intern   who actually did this work. This was led  by Kasia Kedzierska, who interned with us   two summers ago in 2023, and she's the lead author  on the study. But basically, in this research,   we study single-cell foundation models, which  have really recently rocked the world of biology,   because they basically claim to be able to use AI  to unlock understanding about single-cell biology.  

Biologists for a myriad of applications,  everything from understanding how single   cells differentiate into different kinds of  cells, to discovering new drugs for cancer,   will conduct experiments where they measure how  much of every gene is expressed inside of just   one single cell. So these experiments give  us a powerful view into the cell's internal   state. But measurements from these experiments  are incredibly complex. There are about 20,000  

different human genes. So you get this really long  chain of numbers that measure how much there is   of 20,000 different genes. So deriving meaning  from this really long chain of numbers is really   difficult. And single-cell foundation models claim  to be capable of unraveling deeper insights than   ever before. So that's the claim that these  works have made. And in our recent paper,  

we showed that these models may actually  not live up to these claims. Basically,   we showed that single-cell foundation models  perform worse in settings that are fundamental   to biological discovery than much simpler  machine learning and statistical methods   that were used in the field before single-cell  foundation models emerged and are the go-to  

standard for unpacking meaning from these  complicated experiments. So in a nutshell,   we should care about these results because it  has implications on the toolkits that biologists   use to understand their experiments. Our work  suggests that single-cell foundation models may   not be appropriate for practical use just yet, at  least in the discovery applications that we cover.

Huizinga

Well, let's go a little deeper there.  Generative pre-trained transformer models,   GPTs, are relatively new on the research  scene in terms of how they're being used   in novel applications, which  is what you're interested in,   like single-cell biology. So I'm curious, just  sort of as a foundation, what other research   has already been done in this area, and how  does this study illuminate or build on it?

Lu

Absolutely. Okay, so we were the first to  notice and document this issue in single-cell   foundation models, specifically. And this is  because that we have proposed evaluation methods   that, while are common in other areas of AI, have  yet to be commonly used to evaluate single-cell   foundation models. We performed something called  zero-shot evaluation on these models. Prior to our  

work, most works evaluated single-cell foundation  models with fine tuning. And the way to understand   this is because single-cell foundation models are  trained in a way that tries to expose these models   to millions of single-cells. But because you’re  exposing them to a large amount of data, you can't   really rely upon this data being annotated  or like labeled in any particular fashion   then. So in order for them to actually do the  specialized tasks that are useful for biologists,  

you typically have to add on a second training  phase. We call this the fine-tuning phase,   where you have a smaller number of single  cells, but now they are actually labeled   with the specialized tasks that you want the  model to perform. So most people, they typically   evaluate the performance of single-cell  models after they fine-tune these models.   However, what we noticed is that this evaluating  these fine-tuned models has several problems.  

First, it might not actually align with how these  models are actually going to be used by biologists   then. A critical distinction in biology is  that we're not just trying to interact with   an agent that has access to knowledge through  its pre-training, we're trying to extend these  

models to discover new biology beyond the sphere  of influence then. And so in many cases, the point   of using these models, the point of analysis, is  to explore the data with the goal of potentially   discovering something new about the single cell  that the biologists worked with that they weren't   aware of before. So in these kinds of cases, it  is really tough to fine-tune a model. There's  

a bit of a chicken and egg problem going on. If  you don't know, for example, there's a new kind   of cell in the data, you can't really instruct  the model to help us identify these kinds of new   cells. So in other words, fine-tuning these models  for those tasks essentially becomes impossible   then. So the second issue is that evaluations  on fine-tuned models can sometimes mislead us  

in our ability to understand how these models  are working. So for example, the claim behind   single-cell foundation model papers is that these  models learn a foundation of biological knowledge   by being exposed to millions of single cells in  its first training phase, right? But it's possible   when you fine-tune a model, it may just be that  any performance increases that you see using the   model is simply because that you're using a  massive model that is really sophisticated,  

really large. And even if there's any exposure  to any cells at all then, that model is going   to do perfectly fine then. So going back to  our paper, what's really different about this   paper is that we propose zero-shot evaluation for  these models. What that means is that we do not  

fine-tune the model at all, and instead we keep  the model frozen during the analysis step. So how   we specialize it to be a downstream task instead  is that we extract the model's internal embedding   of single-cell data, which is essentially a  numerical vector that contains information that   the model is extracting and organizing from input  data. So it's essentially how the model perceives  

single-cell data and how it's organizing  in its own internal state. So basically,   this is the better way for us to test the claim  that single-cell foundation models are learning   foundational biological insights. Because if  they actually are learning these insights,   they should be present in the models embedding  space even before we fine-tune the model.

Huizinga

Well, let's talk about  methodology on this particular study.   You focused on assessing existing  models in zero-shot learning for   single-cell biology. How did you  go about evaluating these models?

Lu

Yes, so let's dive deeper into how zero-shot  evaluations are conducted, okay? So the premise   here is that we're relying upon the fact that  if these models are fully learning foundational   biological insights, if we take the model's  internal representation of cells, then cells   that are biologically similar should be close in  that internal representation, where cells that are  

biologically distinct should be further apart. And  that is exactly what we tested in our study. We   compared two popular single-cell foundation models  and importantly, we compared these models against   older and reliable tools that biologists have  used for exploratory analyses. So these include   simpler machine learning methods like scVI,  statistical algorithms like Harmony, and   even basic data pre-processing steps, just like  filtering your data down to a more robust subset  

of genes, then. So basically, we tested embeddings  from our two single-cell foundation models against   this baseline in a variety of settings. And  we tested the hypothesis that biologically   similar cells should be similar across these  distinct methods across these datasets.

Huizinga

Well, and as you as you did  the testing, you obviously were aiming   towards research findings, which is  my favorite part of a research paper,   so tell us what you did find and what you feel  the most important takeaways of this paper are.

Lu

Absolutely. So in a nutshell, we found that  these two newly proposed single-cell foundation   models substantially underperformed compared  to older methods then. So to contextualize why   that is such a surprising result, there  is a lot of hype around these methods.   So basically, I think that,yeah,  it's a very surprising result,  

given how hyped these models are and how  people were already adopting them. But our   results basically caution that these shouldn't  really be adopted for these use purposes.

Huizinga

Yeah, so this is serious real-world  impact here in terms of if models are being   adopted and adapted in these applications, how  reliable are they, et cetera? So given that,   who would you say benefits most from what  you've discovered in this paper and why?

Lu

Okay, so two ways, right? So I think this  has at least immediate implications on the way   that we do discovery in biology. And as I've  discussed, these experiments are used for   cases that have practical impact, drug discovery  applications, investigations into basic biology,   then. But let's also talk about the impact for  methodologists, people who are trying to improve   these single-cell foundation models, right?  I think at the base, they're really excited  

proposals. Because if you look at what some of the  prior and less sophisticated methods couldn’t do,   they tended to be more bespoke. So the excitement  of single-cell foundation models is that you have   this general-purpose model that can be  used for everything and while they're not   living up to that purpose just now, just  currently, I think that it's important that  

we continue to bank onto that vision, right? So  if you look at our contributions in that area,   where single-cell foundation models are a  really new proposal, so it makes sense that   we may not know how to fully evaluate them just  yet then. So you can view our work as basically   being a step towards more rigorous evaluation of  these models. Now that we did this experiment,   I think the methodologists know to use this as a  signal on how to improve the models and if they're  

going in the right direction. And in fact, you  are seeing more and more papers adopt zero-shot   evaluations since we put out our paper then.  And so this essentially helps future computer   scientists that are working on single-cell  foundation models know how to train better models.

Huizinga

That said, Alex, finally, what  are the outstanding challenges that you   identified for zero-shot learning research in  biology, and what foundation might this paper   lay for future research agendas in the field?

Lu

Yeah, absolutely. So now that we've shown  single-cell foundation models don't necessarily   perform well, I think the natural question on  everyone's mind is how do we actually train   single-cell foundation models that live up to that  vision, that can perform in helping us discover   new biology then? So I think in the short  term, yeah, we're actively investigating  

many hypotheses in this area. So for example,  my colleagues, Lorin Crawford and Ava Amini,   who were co-authors in the paper, recently put  out a pre-print understanding how training data   composition impacts model performance. And so one  of the surprising findings that they had was that   many of the training data sets that people used  to train single-cell foundation models are highly   redundant, to the point that you can even sample  just a tiny fraction of the data and get basically  

the same performance then. But you can also look  forward to many other explorations in this area   as we continue to develop this research at the end  of the day. But also zooming out into the bigger   picture, I think one major takeaway from this  paper is that developing AI methods for biology  

requires thought about the context of use, right?  I mean, this is obvious for any AI method then,   but I think people have gotten just too used to  taking methods that work out there for natural   vision or natural language maybe in the consumer  domain and then extrapolating these methods  

to biology and expecting that they will work  in the same way then, right? So for example,   one reason why zero-shot evaluation was not  routine practice for single-cell foundation models   prior to our work, I mean, we were the first to  fully establish that as a practice for the field,   was because I think people who have been working  in AI for biology have been looking to these more  

mainstream AI domains to shape their work then.  And so with single-cell foundation models, many   of these models are adopted from large language  models with natural language processing, recycling   the exact same architecture, the exact same code,  basically just recycling practices in that field  

then. So when you look at like practices in like  more mainstream domains, zero-shot evaluation is   definitely explored in those domains, but it's  more of like a niche instead of being considered   central to model understanding. So again,  because biology is different from mainstream   language processing, it's a scientific discipline,  zero-shot evaluation becomes much more important,  

and you have no choice but to use these  models, zero-shot then. So in other words,   I think that we need to be thinking carefully  about what it is that makes training a model   for biology different from training a  model, for example, for consumer purposes.

Huizinga

Alex Lu, thanks for joining  us today, and to our listeners,   thanks for tuning in. If you want to read this  paper, you can find a link at aka.ms/Abstracts,   or you can read it on the Genome Biology  website. See you next time on Abstracts!

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android