Empower Genomics with Proteomics - podcast episode cover

Empower Genomics with Proteomics

Oct 12, 202240 minEp. 2
--:--
--:--
Download Metacast podcast app
Listen to this episode in Metacast mobile app
Don't just listen to podcasts. Learn from them with transcripts, summaries, and chapters for every episode. Skim, search, and bookmark insights. Learn more

Episode description

Show Notes for PiP Ep 02 “Empower Genomics with Proteomics”

For more information about the UK Biobank, an Olink to Science blog post called “Genetic Regulation of the Human Plasma Proteome in the UK Biobank” is available here. The preprint publication itself is available here on bioRxiv

If you’d like to see a great 15 minute presentation on what the goals are for the UK Biobank Pharma Proteomics Project, Dr. Chris Whelan (Biogen) presented this YouTube video at one of the UK Biobank’s scientific meetings that is worth watching.

A paper discussed by Folkersen et al., “Genomic and drug target evaluation of 90 cardiovascular proteins in 30,931 individuals” was published in Nature Metabolism in late 2020, and is available here.

If you would like to contact Dale, Cindy or Sarantis feel free to email us at info@olink.com and if you would like to learn more about our backgrounds, Cindy’s LinkedIn is here, Sarantis’ is here, and Dale’s is here.

In case you were wondering, Proteomics in Proximity refers to the principle underlying Olink Proteomics assay technology called the Proximity Extension Assay (PEA), and more information about the assay and how it works can be found here.

Transcript

Welcome to the Proteomics and Proximity podcast, where your co-hosts Dale Yuzuki, Cindy Lawley and Sarantis Chlamydas from Olink Proteomics talk about the intersection of proteomics with genomics for drug target discovery, the application of proteomics to reveal disease biomarkers and current trends in using proteomics to unlock biological mechanisms. Here we have your hosts, Dale, Cindy and us. Thank you for joining us on the Proteomics and Proximity Podcast.

I'm your host Dale Yuzuki with my co-hosts Cindy Lawley and my other co-hosts are Sarantis Chlamydas. Great. This morning we are talking about Empower Genomics with proteomics. And Cindy, I'd like to ask you the question. If I'm involved in genomics, why should I add proteomics? Well, I'm so happy that you asked me about that, Dale. So, as you know, I tell this story quite a bit, and so I'm delighted to do it in the context with both of you, because you add so much to this.

But you know, we've got a big project going with the UK Biobank, the UK Biobank, of course, being one of the largest nationally associated population biobanks in the world with clinical, genetic and now on a subset of of over 54,000 samples of proteomic data. It's an exciting time, I think, to demonstrate in large populations the value of layering proteomics onto genomics. And of course we've we've across many cohorts invest did a lot in genomics as the costs have have gone down.

Now to back up a little bit, the UK Biobank tell me a little bit about it. Sure. So, yeah, it's as a as the name implies, it's based in the UK, it's affiliated with, you know, the UK has, you know, one of the largest single payer health care systems in the world. And having a population based biobank primarily of northern European descent or ancestry, but certainly representing Asian as well as as African and African diaspora descent as well Pakistani descent.

There's quite a nice subset of diversity in the Biobank, but primarily it's it's northern European. You know, it was it was started what I think 20 years ago. I actually should know that off the top of my head. But it was it was started with the promise of being able to characterize the value of longitudinal information to health care. And I think Eric Topol says that it takes about 20 years on average to move something from discovery to the clinic.

He uses the example of the stethoscope and moving the stethoscope into the clinic took 17 years. That seems like a pretty simple mechanism, right? Listening to your heart. But yet it took a long time for it to be demonstrated and approved and to get into the clinic and routine use. So by longitudinal, then you mean I think they recruit what, half a million individuals and longitudinal means what they follow them over time. It means that they're able to call them back.

So they're able they have medical access to their, you know, clinical data over time. They understand over time what is, you know, what is the outcomes within this population. And I think that's incredibly valuable that consenting has changed over time. So the ability to actually call back wasn't initially in many of these biobanks. Right. And so , I think of FinnGen as one of those that another biobank that's based in Finland, a population health biobank as well.

They really did sort of lead the way with some of the ability to share data and protect it at the same time, you know, primarily focused on genetics and I would say UK Biobank as well has led the way in these abilities to work with both private and public partnerships. So being able to work with pharma and in this case with the proteomic data, this initial set of proteomic data that was initially started with ten pharma partners.

The model was interestingly based upon the exome sequencing consortium. So a group of pharma partners came together in order to get some to get the participants within the UK Biobank exome sequenced. There's of course also I think it's 150,000 individuals in the UK Biobank that are also whole genome sequenced, which is pretty phenomenal, right? That's a huge number. So a. Massive management.

And yeah, of all of those dollars and time and and the potential for building analysis tools with such large datasets. Right. That's that's, you know, goes without saying. I remember I think it was 2018, I was at ASHG, which is the American Society for Human Genetics, and I was blown away by all of the talks that were referencing the UK Biobank data, building tools, having discoveries. Right.

I'm excited to see that same evolution in in the discussion with crowdsourcing these data with proteomics. So to get back to your original question, Dale, what's the value? I think Karsten Suhre would say that when you have genetic data and you have disease, that there's a certain power you have to detect the relationship between the two. And in some cases, we have smoking guns like BRCA. Right.

So we're able to to see that there's a lot of penetrance for a variant that that shows up and has a lot of influence on on predisposition to disease. But for most diseases, it's death by a thousand cuts, right? Small amounts of influence. Cindy, this UK Biobank sounds so interesting. What can you tell me more about it? Yeah. So. So the UK Biobank itself is a longitudinal collection of data I think it started in the mid 2000.

So around 2006 they targeted an age group, sort of middle age group and they followed them over time and so it's over half a million individuals within the UK. Yeah, quite, quite a undertaking to enroll all those participants.

And what I will never forget is when I attended ASHG the American Society of Human Genetics in in 2018 the number of talks I remember searching on UK BB as just in a short an acronym and the number of talks talking about using the UK B data, particularly genetic data for validating clinical findings was there were a lot of talks.

So it's always been high on Illumina's radar and it's very high on all of those sequencing technology innovator radar, innovator’s radars like Thermo Fisher, of course, and all of those library prep companies that have different methods of library prep. Have they already sequenced everybody? So they've got a whole genome sequence on, I think it's about 150,000 individuals that was primarily led by Decode Genetics. As I understand it, the publication is pretty recent actually.

So it's a there's a lot to dig into. It's a pretty phenomenal dataset. The bulk of the sequencing for most samples, I believe is exome sequencing. That's my understanding. So I think it's it's over 450,000 individuals. So I think for some they've got both. And since it's a single payer system with the electronic records of NHS, that means they can drill down into exactly right their whole exome sequence and whatever condition they may have And this is an ongoing thing.

Is that right? Cancer, diabetes? That's right. And the ability to actually return results to those patients, I think has evolved over time. Right. Because that costs money to set expectations, make sure that we're we're communicating, you know, in a way that's best practices. So I think that the UK Biobank has spearheaded a lot of our understanding about best practices there as well. And as far as.... Go ahead.

Oh, I was just going to say so back to your original question about what's the value of layering proteomics onto genomics. This is a great example where an enormous amount of investment has gone into collecting genetic information for this very valuable population, with advances in diagnostics, in guiding cancer treatment. So a lot of these advances have made it to the clinic already, which is pretty phenomenal.

And that's been driven, you know, globally, it's it's exciting where proteomics fits in or the way that I think of it. I was I had a conversation with Karsten Suhre, who's at Weill Cornell in Qatar and in New York and he he really I had an aha moment with him. He he essentially would say that an intermediate phenotype like proteomics acts to magnify the effect between genetics and disease.

So, of course, we've been looking for these associations between genetics and disease since we've been collecting genetic information. And some of those links are hard to see because we need so many samples to be able to see them. And so as we've increased the numbers of samples like in the UK Biobank, we're able to make these associations more clearly. And I'll say, you know, we wished early on we hoped for smoking guns for a lot of diseases and we we did see a few of them.

Right. So there's certainly PCSK9 for familial hypercholesterolemia. There are some standard ones, BRCA for breast cancer. There's there's some examples where we have a lot of penetrance or a lot of, you know, a lot of affect on someone's likelihood of getting a disease from single variants or single loci or single genes. But for most diseases like Type 2 diabetes, cardiovascular disease this is a death by a thousand cuts, meaning lots of variants. Give a little tiny effect in changing our risk.

And so that's where that's where having having the ability to amplify or to put a magnifying glass on those those relationships between genetics and disease is incredibly useful. So I think, you know, a ton of of work has been done in proteomics and cardiovascular disease. And I think many advances have happened there. Now getting back to the UK Biobank. You mentioned before that recently they were working with Olink then to look at the proteome of tens of thousands of individuals.

Yeah. Yeah. I would say it's not just the UK Biobank, but but 13 pharma partners. Right. So, so it certainly required consent and partnership with the UK Biobank, just like the exome sequencing consortium was was done in collaboration with the UK Biobank.

But the access to the technology, just like with the exome sequencing consortium, access to the technology was spearheaded by pharma partners that were very keen to build a structure for a more, I like to say, a systematic approach to therapeutic target discovery, not only biomarker discovery, which is sort of traditional proteomics, but to to therapeutic target discovery, which I think is enabled by genetics, proteomics as well as clinical data.

So we're getting back to this idea of empowering genomics with proteomics, right? What can you tell me about that? Yeah. So I think there's this, you know, I immediately think of that Karsten Suhre, magnifying glass, right. But the, the UK Biobank’s initial findings which are in a preprint that came out in June, middle of June, that’s on bioRxiv. Their initial paper really just was scratching the surface of what's possible with this enormous dataset. So their first paper was about 1500 proteins.

So our first, our first product that Olink first product on the Explore platform that has the NGS readout. So they use that first tranche of proteins across 54,000 samples and the first really the bulk of those data are to look at correlations between gene regions, you know, and the genotypes in those gene regions. And protein levels. So really just looking what are the correlations?

What's the list of all the possible relationships between genetic regions and protein levels that might be elucidated and examined further in this beautiful dataset?

What I you know, and I'll speculate on what I think they're going to be doing next and what my guess is that they're very deep into doing this within these companies is to then do Mendelian randomization, which is a statistical approach to kind of determine which of these relationships, which of these correlations between gene regions and and protein levels.

You know, when you put it when you bring in the clinical data on disease, which of these hold up as being unlikely to be happening by chance alone? So now you sort of have the the ones that are likely just, you know, coincidence. I mean, they might still be important, but but let's just I think pharma would like to have ten great targets, over 400 unsure targets, because that that's a lot of rabbit holes to go down.

So if they can narrow it down, then then I think there's a lot of excitement around being able to have some quick wins with proteomics, genomics and clinical data. Well, to back up just a little bit, you're talking about Mendelian randomization, you're talking about genomic data in terms of a whole exome of 10,000 people or 50,000 people, and now you're talking about 1500 proteins. Can you walk me through that a little bit? Yeah, sure.

So so when you're looking at the genetic data and here we've got, you know, some whole genome sequencing data as well as exome sequencing data. So you can imagine you have a list of ways or places in the genome almost like like geographic locations, almost GPS coordinates in the on the chromosomes where we know they vary across the samples.

So those variable regions, we’ll call them SNPs, you know, that that's the term that that we use for the simplest kind of variation, just single base pair variation, but we'll just call them snips because, because there can be other kinds of variation that are captured there too. But if you look at the variance, it's just a single variant within the genome.

You can look at the the representation of what people's genotype is at that location, and you can look at every single protein in that 1500 protein list and see, do we have a significant correlation between the genotype and the protein level? So that's sort of the first step. That's a lot of tests write comparisons. Cindy, so I understand these SNPs can also be outside of the gene. Right would be also make it

So they could be regulatory regions. Absolutely. It’s not necessarily falling into the gene Is there any threshold to what they are checking, what region they check in the gene? Yeah. So there's both a statistical threshold that they accept as a as a standard. But also when you're doing so many tests, you have to correct for multiple tests, right? Because the more tests you do, the you're increasing your chances

of seeing a false positive. So adjusting for that is something that, you know, we go through peer review to make sure we have best practices and agreement on. I mean, these statistical associations are massive. I mean, in a given single individual's whole genome, you're looking at maybe four million SNPs? Yeah, right. You have 4 million SNPs and then you’ve got 1500 proteins you're associating those with, if I understand you correctly.

Yeah. And then you multiply this times what they did, 54,000 individuals. They did 54,000. So I mean. And they discovered, you know, about 10,000, I think it was around 10,200 relationships between gene regions and protein levels. Right. That's a massive number. So that's those those could many of those could just be coincidence, right? Just correlations, not causation. Right. We were all familiar with that, that phrase. So so 85% of those relationships were novel. Now, the relationship.

Cindy, could they be both in cis and trans, both of them? Correlation or. What's that? Sarantis. Cis and trans would be like this correlations could be there. So yeah. So these correlations can be what we call cis or they can be in trans. So cis just is getting back to Dale’s question about whether or actually was your question Sarantis, about whether these variants, these SNPs are inside genes or are they outside genes?

And if they're in genes or or in close proximity to the genes, that code for the protein itself. Right. So you've got a variant that's in a gene coding for a protein. If you see a correlation that's significant between those two, we call that a ”cis-pQTL” and that’s a feel good measure that says, oh, we must be measuring the right protein. If this is real, then and there's ways to to press on it, to check it and validate it, of course, with us orthogonal data, but that's it.

So people often talk about cis-pQTL discoveries being verification then of of having measured the right protein because of course our assay is not a Mass Spec method. We're using antibodies as hooks. We're using two antibodies as a hook to hook a protein out of out of solution. And we have little single stranded oligos attached to them. So those oligos can then hybridize, we can extend and amplify that up just like any old library prep for for sequencing.

And then we count those oligos as a proxy for the original level of the proteins in the sample. And so when you're doing an affinity method, right, a hooking method to pull it out, not only is it great for low abundant proteins, that's one of the things we add value to with with mass spec folks. They, they like us because they can look at areas of the proteome like they couldn't see easily with mass spec, without tons of sample and a lot of a lot of control of variability.

So it's, it's, it's a nice method from that perspective, but it's, it's a little bit indirect because we're, we're pulling out the protein and converting it to DNA signal. So making sure we have a way to normalize those data and an end, you know, just like with mass spec in any proteomics experiment to manage variability from batch to batch, you know, these are important aspects that that proteomics scientists are much better prepared to describe or explain than I am.

I have come to appreciate it. Now, Cindy, something that you touched upon, right, was the sort of drug discovery dimension of this. But even before we go in that direction, you also mentioned something in terms of SNPs and genes, the majority of GWAS, thousands, right. 3500 GWAS studies or many, many, many. Oftentimes these SNPs that are associated with risk. Often our gene deserts are there in. Right. There's no function. That's right. What can you comment on that? Right. So these pQTLs. Right.

are SNPs, but aren’t they just random, so to speak random places in the genome? Yeah, so good question. So I'm going to reference a paper by Lasse Folkersen and Anders Mälarstig. Now the two of them, along with collaborators, there's a long list of authors that I won't I won't list. Brilliant, obviously across multiple cohorts, they have their milestone paper within a study called the Scallop Study, which is really a cohort of cohorts. They were doing what the UK Biobank wants to do.

This is me putting words in the UK Biobank’s mouth. But I think that the Folkersen et al. milestone publication, is a powerful precursor to what the UK Biobank is, has the possibility to do. In Folkersen et al. they looked at just 90 proteins I say just although at the time in 2020 that was a lot of multiplex proteins of course they looked at cardio, primarily cardiovascular, what we what we broadly categorize as cardiovascular proteins and they did the same kind of study.

So on 30,000 samples, they looked at 90 proteins with genetic, clinical and proteomic data. They did the correlations just like the UK Biobank has done in their preprint. The 90 proteins resulted in 450. Yea, a little over 450 pQTLs Some of those are cis-pQTLs as Sarantis hints on Right.

So 88% of the proteins had cis-pQTLs identified there That's like I said, a feel good method, a method or something we can kind of point to, to say this, this, this looks like it's, you know, increasing our confidence that we're measuring the right protein, although there are good biological reasons why you might not see a cis-pQTL, but the remaining trans-pQTLs were essential discovery of trans-pQTLs is incredibly important to understand protein-protein interactions.

So I may have taken a bit of a meandering way to get back to your question, Dale, about these relationships but trans-pQTLs and figuring out where those are coding, you know, what proteins are they coding for, what gene regions are they associated with? It's not a trivial matter. And so I've had discussions with Lasse as well as Anderson or sorry as well as Anders around this this challenge.

And so just to define trans-pQTLs… so as a reminder cis-pQTLs are where you have the gene variant is either in or near the gene that codes for the protein that you're measuring. So the relationship between them, the correlation is between the gene region and the protein itself that’s a cis-pQTL So you see you have say you have a particular protein your measuring will say TNF-alpha, or alpha-TNF and there’s a SNP Then that codes for alpha TNF is in the same chromosome within, I don't know.

A couple hundred pairs of a. Million base pairs. So yeah. When you're. In the general in the general region and so there could be those million base pairs a lot of other genes, but nonetheless. Right. You're saying that that particular snip was controlling alpha TNF. It's suggests that I think they might not. Yeah, they may not say it quite so strongly simply because there's you. Know it's association. Yeah. Yeah exactly. It's just. statistical calculation. Got it.

and so with a trans-pQTL, what that is, is you’ve got a variant you know, you might have a gene coding for a protein and that gene might be on chromosome nine. But you might have the pQTL on chromosome 19. You know, you might have it on a completely different chromosome, a correlation with that same protein. So the sort of Occam's Razor, you know, the easiest the most straightforward possibility is that that there's a relationship between those two proteins, right?

That there’s protein-protein interaction going on there. And in fact, the STRING database is a publicly available database that records and collects and is curated around protein-protein interactions. And so what the team would do, you know, in asking them how they how do they dig into each of these relationships? And what they would do is look, they report the closest gene to the location that's in trans with this protein. They they report the closest gene geographically.

And then they also report because they do kind of a deep dive into surrounding genes, as you say Dale. There could be, you know, surrounding genes that that might be implicated. They look at those other surrounding genes and they say, you know, what's the shortest pathway back to that protein?

And that is a fascinating conversation because once you once you put together a pathway analysis like that and we talk about different diseases, now you've got some pathways in, say, Alzheimer's disease and you’ve got some pathways in, say, schizophrenia. I'm just picking two neurological diseases.

And now if you can imagine a Venn diagram of the pathways those two have in common, and that is an opportunity for us to understand the mechanistic biology that's in common between those two neurological diseases, if any. You know, I'm just picking those out of the air. If we can return back to that Folkersen landmark paper. Mm hmm. So, if I understand correctly, there were 90 proteins. How many tens of thousands of samples? 30,000 samples. Just over. Okay, so 30,000 samples times 90 proteins.

And they also had like whole genome data on those 30,000 individuals. Is that. Right? They had genetic data that you could. So I don't know that. Remember, this is a cohort of cohorts. So I think they had GWAS data or genotyping data, you know, array data on some of those and sequencing data on others. I wouldn't want to represent that, but my guess is that they they had variation, genetic data that they had in common. Right.

Because you can convert a whole genome sequencing dataset to a list of variants Understood. And yeah. Right. So they had all the genetic data of 30,000 individuals. They looked at these 90 proteins and then you mentioned that they're able to connect it then to disease. Yeah. So so you do the same thing. This looked at relationships between genetic, you know, state and protein levels. So you look for all those correlations.

In this paper, they found 450 pQTLs that exceeded their significance threshold. And you could, you know, as you touched on before there, you know, that's why we have peer review to make sure that we're not that we're held accountable for the number of tests that we're doing, that we're you know, we're really trying to be as as transparent as possible and publishing these data. And by the way, it was published in Nature Metabolism in 2020.

So once you you see all the correlations, imagine you have this list of correlations. You can layer those clinical data then in. So now you know the disease information and you can look at these different sets of data. So genetics, proteomics and disease and you can sample from these and determine how often with the relationships between three of these units, how often would that happen by chance alone? If it would happen by chance alone? Quite often. Then we let that fall away.

If it seems quite unusual to see these relationships, then those are the ones that we elevate to potential causality. And so in this paper they elevated from the 450 relationships correlations. They elevated 25 that they suggest appear causal. And some of those examples I think, are validated. All I know are validated clinical targets for existing therapies, super exciting because then it's like, oh, looks like we're on the right track, right? And then of course some novel findings.

So they they report 14 validated clinical targets, known clinical targets like CASP-8 in breast cancer was one of them that I can think of. So CASP-8 is something known already before to be involved in breast cancer? That’s right. And then they rediscovered it? Yea, CASP-8 is a known therapeutic target in breast cancer. I see. And then 11 of those were novel, so they were not able to see any evidence of 11 of their findings that elevated again to causality, potential causality.

And those are the exciting ones for for a new programs potentially and then and then 18 they they reported 18 potential repurposing opportunities. So that's super exciting to me because if you've got an existing drug for one indication, say tocilizumab for rheumatoid arthritis and yeah have you you have the possibility of then using that in a different indication that that would be a repurposing opportunity. So for example in eczema.

I guess it doesn't make sense to think about using an anti rheumatoid arthritis drug. Right. That's on market to treat eczema. That's just I mean. There's one in clinical trials. I mean, coming back to the cohorts, Cindy, I think also the fact that these cohorts there from different geographical places increase the possibility to illuminate, for example, biases on SNPs. Right. Did you have any discussion with the authors about that?

Do they ever consider that the bias, geographical bias may influence their data? Can you comment on this? Yeah, it's a great question. So they primarily represent northern European populations. There were there was some representation of Asian populations in there, but not a not a lot. And I'm trying to remember, I don't think there was any African diaspora in this milestone paper in the subset of samples that they had in this milestone paper.

So that's that's a you know, it's a blessing and a curse. Right. For them. It eases the analysis to your point for the opportunity to make discoveries because of diversity within the ancestry of our genomes. It’s a “miss”, right? And an enormous potential future opportunity, which I think is very exciting and very important for equity in health care. I mean, essential. So we have to start somewhere, though, right? So we start with the populations that we have.

It's fascinating thinking about the 90 proteins, all the different things that discovered. Right, these 25 drug targets for. That explains why the pharma interests in the UK Biobank by doing the extrapolation. Have you done the extrapolation? How many drug targets they expect. Yeah. So with this, you know it’s around five and a half percent of the pQTLs discovered in Folkersen et al, converted to, you know, potentially causal. Interesting.

So if we applied that same percentage which is lofty, right, that's is a lot of proteins and, and these 90 in Folkersen et al were well studied you know considering across 30,000 samples so you know I would I would expect maybe four, four and a half percent to maybe 5% converting.

In this initial set of proteins, I think to be a little conservative, you know, not not trying to be too bullish, but even with that, we're talking about potentially listing off causal markers to examine, to investigate potentially causal markers of, you know, around 500. So that. Five hundred potential drug targets.” Potential therapeutic targets. That's right.

And to be fair, some of these might show up as potential therapeutic targets that would never be considered if they're in signaling pathways, for example. So so it's up to pharma. And certainly people that are are more versed in clinical trials and potential, you know, pathways for these and implications of side effects to then up score and down score these.

But the exciting aspect of this is to have a systematic approach by which to do that, to actually make that list of 500 and then up score some and start programs. Because we like to say that clinical trials are twice as likely to be successful.

If you go into that trial with genetic information that's certainly, you know, been published and we like to say that adding proteomic data, I'd really love to see what that means for our potential for for improving our ability to be successful in clinical trials. And these 13, I guess if you take those 500 targets (or potential drug targets) divided by 13 different pharma partners, that's like, what, 35 apiece. Yeah, that's right. That's a lot of programs. That's a lot of programs.

I mean, that's going to be a wealth of data for them. Now, I understand why they would invest in such a project when what is the next step then in the UK Biobank project and how people find out more about it? Good question. Yeah. So the what I fully expect and I know of at least at least eight abstracts that have been submitted for ASHG this year. Now ASHG, American Society for Human Genetics, as I mentioned earlier, will be in Los Angeles in October.

And so I know that those pharma partners and the researchers within those pharma partners are submitting abstracts to present there and I'm sure some of them will get oral or oral presentations. Many of them will get poster presentations. But I will be keeping a close eye on that and I will absolutely be there. And I think we should do a podcast episode. There you go. You will have a post ASHG. This is what I got out of it. That would be great. And maybe drag a few guests on if if we can.

That's great. Yeah, that'd be great. Yeah. And as far as what I think is next, think they're going to be digging into these these correlations, 85% of them novel. So roughly 8000 novel relationships between genetic regions and protein levels. They're going to be looking into which of those are appear causal within certain diseases. Do you know when that will be available? Publicly-available data? How how scientists can have access to these?

Is it a easy process or a difficult process to have access on that? Yeah. So as, as you probably know, Sarantis, but our listeners may not know the UK Biobank data through a data use agreement is, is broadly available. So this is one of the, the reasons there's so much use of those data as validation data and for discoveries with very clever informatics scientists and biologists to think of creative ways to use such a large dataset, the proteomics data, the first set of proteomics data.

So the first 1500 proteins, the subject of the June bioRxiv paper Those data have been stated that they will be publicly available by the end of the year. So I expect, you know, by October at ASHG we'll know better. The timing for that, yeah, those pharma partners, of course have had access to those data as they should, which is why they were able to publish that, that paper so quickly. And so the next tranche of data for the full 3000 proteins.

And can I just say, you know, you see what's possible with 90 proteins and Folkersen et al. Imagine what's possible, you know, with 30,000 proteins, 3000 proteins and 54,000 individuals. That's a lot of power to deduct relationships between proteins and and many proteins that really just haven't had assays for, for examining them. So just such a such an opportunity for discovery.

We touched upon yeah, we touched upon the enormous investment made to-date to collect these 500,000 samples That's right. And to follow up and all those like genetics. Yeah. The whole genome, whole exome data on all these individuals and then now overlaying empowering the genomics with the proteomics. It's as if we're a part of something that is the next big thing in genetics is proteomics. I think it's you know, and when you think about the the central dogma of biology, right? You've got DNA.

RNA, we've done a great job of looking at DNA. RNA has been our proxy for time biology for a long time because it was it was available to to look at with sequencing technologies. In fact you and I Dale, I think have talked about how the RNA-Seq and the ability to do what we call “digital gene expression” sold many of those initial instruments that were, you know, next generation sequencing instruments. But now we have this this ability to measure proteins directly in a in a very scalable way.

And I am excited, as you know, about this capability, but it's really the researchers and what they can do with it that will tell us the true potential of this. Super. Well. Thank you, Cindy, for sharing your thoughts on empowering genomics with proteomics. And we'll see you soon. That was great. Thank you very much. Thank you for listening to the Proteomics in Proximity podcast brought to you by Olink Proteomics. To contact the hosts or for further information

simply email

info@olink.com.

Transcript source: Provided by creator in RSS feed: download file
For the best experience, listen in Metacast app for iOS or Android