Welcome to Berry's In the Interim podcast, where we explore the cutting edge of innovative clinical trial design for the pharmaceutical and medical industries, and so much more. Let's dive in.
All right, welcome. We are back. Uh, in the interim, this is Barry Consultants podcast of all things statistical, all things scientific with clinical trials, uh, medical decision making, uh, and. We have an interesting podcast today in that I don't know the topic, so my, my co-host on in the interim, uh, Kurt Veley is here and he has a surprise topic to talk about today. Kurt.
Hey Scott. Um, alright, so I wanna start with a story. I went to a conference not that long ago, and standard statistical conference. People are talking about the data they had, all the methods that they've been developing to. Understand it, what they've learned from the data, and essentially all of these. Talks.
They had a really interesting structure, so they, they get through all their methods and everything, and at the end of every single talk, it was ended with, if only I had gotten a chance to design the data, design the experiment, design the data, do this in advance, everything would've been better. I would've avoided all these problems and so on. And so I'm listening to all these topics and when I get to the, my talk, of course, I started it after listening to all this is I live in utopia.
And then talked about experimental design. And so I, I get back in the car on the, on the way home, and I'm sitting here thinking about, so am I really in the good place or am I in the bad place? Because one thing that's always impressed me about the last 20 years in experimental design. Is all of these methods that we're developing to understand data, causal inference, all of these, these aspects of, of how do we make inferences from data in front of us?
We don't use those in experimental design, so we often have an idea that when we design a trial, we're to ignore every bit of data that's ever existed on Earth. If I go to my doctor and I ask, why are you giving me a drug? They're gonna report lots and lots of studies that say, this drug is good. If I go into an experiment and say, I want to use this data, I'm immediately told it's bad. And that depends on if I'm borrowing information. I got a drug that's targeted for I. A specific mutation.
I have data on thyroid cancer. Can I use that for lung cancer? Generally, the answer is often no. If I wanna borrow historical data from old clinical trials, Alzheimer's, we have hundreds of thousands of patients that have been treated on placebos. Can I use it? Well, there's problems with that. So on real world evidence, so I, I would actually define the standard debate that we've been having for the last 15 years is essentially, is data good or is it bad in experimental design?
So does it lead us to a good place or a bad place? So I'd like to talk about where we are, how we got here, and where we think we're going. So I'm gonna leave it up to you. I've surprised you with a topic and see what your reactions are.
well, well let's, let's, let's sort of figure out the topic and the interesting part of this. So what, when you say is data good or bad? Do you mean that When I design an experiment, I'm saying I want to collect the following data and I can analyze just the data in that experiment, which we've been doing for a long time. Uh, frequentist, uh, rarely Bayesian, but we, we do that. You're saying by data, is that outside the experiment? Is there any room in that experiment? For external,
Yes. So what? What do we do with essentially the totality of human knowledge? Prior to the experiment, should that enter into the experiment in any way and wanna avoid this? I mean, there's lots of rhetoric we could do with this. If you want to, if you want to say bad things about this, you talk about biases and confounding and everything else. If you wanna say good things, you talk about totality, the evidence, but really let's get at, you know, what is gonna lead us in good and bad directions?
How do we decide when to do this and when to not?
Yeah. So let's, let's, let's lay out why not. Why? I mean, why, what are the reasons people give that we shouldn't use other data in our experiment?
Obvious answer would be. It, it could lead us in the wrong direction. So the, uh, experiment might have been done with a different set of patients at a different time. There are key differences between that old data and what my current experiment is. If I use the new data or use the old data in my new experiment. I can get, uh, answers that are biased in certain ways. I can draw wrong conclusions.
I can say drugs work that don't simply because of the information I'm bringing in rather than the experiment itself.
So is this, is this a frequentist issue? Is this that we calculate the operating characteristics of the new experiment? Type one error is only the new experiment. And if I use any data outside of my experiment, you can inflate type one error. You can get bias, all these bad terms, uh, uh, in it. Is it a frequentist problem that data outside the experiment all of a sudden type one error means something very different?
Uh, bias means something very different, uh, that I can't use stuff and be a frequentist.
So I don't know if it's a, I think there is a frequent DYS Bayesian divide. I think Bayesians are trained with the idea of collect some data, update your beliefs, collect more data, update your beliefs, and it's a natural thing to do. I. To bring things in, but I don't know if, if I were to say that the problem is not being a frequentist in terms of our, do you care about long-term error rates? I would lay the problem at the feet of, you have to have this 2.5 type one error.
In the worst case scenario. So in effect, we're playing minimax against nature. We have to assume that nature has deceived us for the last 25 years, and then in that case, we shouldn't use the data. So, uh, on the one hand, that delivers an immense amount of robustness. On the other hand, we're reinventing the wheel.
Yeah. Uh, so every time, so let, let's, let's talk about a case where we would use external data, and we have, uh, we've designed trials where we've used external data and what that looks like and what could be the potential problem. So, um, suppose. I've got results from another trial, a phase two trial about the relative efficacy of a treatment, and I bring it into my next experiment. And I use that as prior knowledge. And when I'm done with my new experiment, I combine them together.
I. Or I bring in previous data on the control arm. Uh, I'm comparing to an active comparator, something that's approved. It's standard of care, and I wanna run a new experiment of my new drug. And I want to compare to standard of care. There are lots of trials and lots of information about standard of care. Why do I have to run a one-to-one randomized trial standard of care to my therapy when I know so much about it?
So I might bring in that information specifically into the trial and e both of those notions. Can cause issues in my new experiment essentially. If it's wrong, I if that is different than my new experiment. Statisticians are great at saying, well, boy, if that data was a little bit different. Now your new experiment. Has a type one error or bias or, or issues that if I bring in anything from outside the trial, it can cause issues. So that might be ways in which I bring in external data.
are certainly, I mean, there are cases where you are gonna run into those problems. I mean, we've talked in examples and antibiotics. The development of resistance over time, we expect therapies to change. There are situations where we think that problem exists and we need to address it or not use the data. There are also cases where diseases are very stable over time, and those issues may not be as big a concern.
Yeah, so I, I mean, I'm, I, I, I'm very much on the side that we should be using external data. I, I, uh, I think we go about this in a way that's just so strikingly conservative that it slows us down, uh, in it, that, that. That, you know, something may happen.
I, I think it falls under the realm that science is hard, uh, incorporating this other information, uh, be explicit about it, incorporate it, and the process of this is hard, but the notion that we don't use any of that information just seems strikingly wrong.
So I, it's an interesting question 'cause it certainly can be wrong on occasion. So I, I wonder if as a society and you, you ask about frequentist and whether the error rates are driving this, I think you can be a, a perfectly good and wonderful frequentist and use historical data, but what you're gonna have to accept is most of the time prior data is leading you in the right direction. If it's not, we should just see science altogether. Because we have worse problems.
But in any case, if it's generally leading us in the right direction, we're gonna be running a few experiments that have say three or 4% type one error. And we're gonna be running a lot more experiments that have say one or two. And in the grand scheme of things, we're still putting a better mix of drugs on the shelf. More drugs that work, less drugs that don't. So I think we have to change our mindset to be frequentist in this kind of world.
Yeah, it's a, and I'm struck by, I, I'm struck by the idea of using that data and in and in Bayesian we do
I.
borrowing of that information. We do things that are dynamic. I'm struck by, there are tons of trials that we use, objective performance criteria. That it's a single arm trial and the drug has to jump a particular hurdle. That's a number. That number's always based on prior data, and we do that all the time, and presumably that's okay. Or you run a trial where your only, your data comes from the new experiment, and that's your control.
But the idea that we do something halfway between that, that we use it, it just strikes it, people the wrong way, just do one or the other. And, and I, I'll, I'll, I'll give an example of this happening to me. Where, by the way, and, and if you listen to, in the interim a lot, you'll find that we, we very much respect the FDA. The it, they're, they're, they're the, um, uh, i I in the world. They do tremendous good and, and they're wonderful. It doesn't mean we don't run into hurdles.
So we presented a
so just, just to interrupt you there, I mean, we go to the FDAA hundred times a year and have three major disagreements. That's much better than with our wives, so.
Yeah. Yeah. So we provide an example that we wanted to go in with a new therapy. It was oncology and we were, we, the, the new trial. We didn't want to enroll one to one, to a standard of care. Uh, it's easier to get patients in if they're more likely to get experimental treatment. Uh, we wanted to enroll three to one, three to experimental, one to control. There's a good bit of information about the behavior of the control.
So we were borrowing from previous trials, and let's assume that, that the prior data was a 15%, uh, overall, uh, objective response rate. Make it simple and, but we're gonna enroll three to one. We're gonna borrow on the control using that, that prior data that's centered on 15%. In the new experiment because we're borrowing data that's 15% on the control.
If the new rated, that new experiment is actually higher than that, we're pulling the control down, making it easier for the treatment to look better, and we present those operating characteristics. That was, that was deemed to be unacceptable because you inflate type one error rate. The response from the agency was use 15% as an objective control and don't enroll any new controls, and you have to beat 15%. Of course, that we generally don't calculate the. Type one error.
If in fact the real rate in that experiment was 25 or 30% because we're jumping 15%. But it's almost the, the, the opposite Bayesian view that we know that answer's 15% for what you're trying to beat. And if you can beat that, that's good. It's using a, a, a prior that's all focused on a single value rather than modeling it with some borrowing and recognizing the uncertainty. Or enroll a whole control arm, so do one or the other, but it's entirely this experiment.
But the idea of using borrowing or modeling seems to be harder to accept.
It's interesting, it's almost you have to cross the Rubicon of what are you willing to completely make this assumption or not make this assumption at all? And you can't test the assumption.
Right. Right. And the other extreme of this, and I know, I know, I know this one, uh, gets at you a little
Don't do it. Don't do it, Scott.
Yeah. That we talk about using real world evidence all the time. Digital twins or real world evidence. So for, for control, we bring in external data. And people seem to be almost comfortable that your control is entirely external data, that's okay. Um, but if you were to do something like use non concurrent controls from your own experiment, that seems to be unacceptable despite the fact that these are unbelievably phenomenal, uh, historical controls.
And better than any real world evidence you're gonna get same protocol. You have the exact same data, the same quality. The only difference with them is time. People seem really uncomfortable with that. But yet, at your conference you went to, there are probably 10 talks about using real world evidence. It.
Yep. The, um, I mean, it, it's, there are different legal reasons that real world evidence is being handled in certain ways right now compared to other. Um, but, you know, FDA is not a, um, it's not a homogeneous organization. There are lots of people, the academic community is a, is many people, industry is many people. Um, but I think we're gonna, if we're talking about where we're going, one of the key aspects of this is a hierarchy of what's more or less dangerous.
And I'd actually like to see us spending more time on. Alzheimer's trials. We've got dozens, hundreds of these. So how much do the control arms differ from trial to trial to trial to trial? If we have hundreds of pieces of trials, we should be able to estimate this. We're statisticians. We can estimate a variance parameter. Let's get an idea about what it is, and that variance parameter translates into how much we should borrow.
Yeah, and, and you, you mentioned a key point to that if you're enrolling some new controls, you actually get a comparison. You get the, you get the controls in your experiment against the previous controls, and you can judge some level of, of similarity between those and, and what's become okay.
Why don't, why don't you talk about that a little more? About just the details? 'cause I think there's often a sense of. the fact that there is a hierarchy between a single arm trial, dynamic borrowing, static borrowing, get an idea about what those methods are and how they mitigate those risks
Yeah, yeah, yeah. So, so the extremes on this, and I can run an experiment where I only include data in my new experiment, uh, one-to-one randomized control to treatment. And I'm only gonna compare those two arms. Uh, in that I could do the exact ex, uh, very much extreme from that, where I enroll only my treatment arm. And I, I want to know, is the treatment arm behaving, you know. Better than, or is the treatment benefiting these patients?
I would have to figure out a way to create a control arm. I could use entirely historical data. In that far extreme. I have no ability to know whether that control rates I'm using is is reasonable. If suppose I do something halfway that I enroll two to one or three to one. I use the controls in that experiment, but I also use external controls to reinforce the control. Now I'm getting new controls so I can judge their similarity to the external controls.
And statistically model that similarity using the external data to strength if they're similar, and not if they're not, but I do get evidence about their similarity. And it's different than this extreme case where I only use controls and I have no idea whether, whether they're good or not. Uh, as, as a middling ground to this.
And I think, you know, that's really what's going on in the academic literature right now in terms of trying to figure out the best ways to make that assessment, minimize the risks, amplify the benefits. Um, we're talking now more about covariate matching between studies. I. So not just looking at all the historical data, but which patients best match our current enrollment.
Um, none of that of course is going to be perfect, but I think it, again, it's the question of do we generally add information or are we taking a risk and we should be looking at a society If we do this, these kind of trials over and over and over again, do we get a better set of drugs on the shelf?
Yeah, and, and part of this is it. Part of it is statisticians are really smart and they can figure out how something can go wrong. That this, the, the, the historical data's not the same as the new controls we've got, uh, beha, we've got problems with the analysis there. We've got potential type one error inflation. We've got potential bias in the estimate of the treatment effect. And statisticians can figure this out and say that, you know, this is bad.
But if we have to do experiments where we have to enroll one-to-one, that experiment is bigger. It's more patients. It's more patients on control. We take less shots on goal. We learn slower. It's, it's the whole industry is slowed. If every time we have a question. It has to be only in that experiment as opposed to using what we know scientifically to reinforce that making a more efficient trial we can, we can look at more treatments, we can treat fewer patients, we can do this cheaper.
So there is a huge societal, I'll call it societal, a whole medical field issue. If every time we have a question. The only thing we can do is look at the new experiment. It seems like there's a huge waste.
Well, we basically, it's like data. We design an experiment, data comes into existence. There is this bright shining moment of about five minutes where we can do something with it. And then we're meant to put it aside, at least with regard to future experiments. Obviously people are gonna reference it in deciding treatments and so forth in the medium, but it, it's interesting, it worries me as a statistician, I feel deep-seated guilt actually putting data aside.
Yeah. Yeah. Uh, and, and, and again, it's hard, uh, but there's, there's huge ramifications to not doing it, uh, in that ways. And I think there are ways of doing it very well. You can be prospective in your new experiment. The analysis plan is completely written down, written down. It's explicit. When you present the results you, you show the results. Just in the experiment, you show what happens using the modeling of external data. I think we can do this really well.
I think, by the way, this is changing a little bit. We're seeing this even a somewhat similar issue where you run basket trials. There, there's a lot of reticence to a, you know, we have four kinds of patients and we run an experiment of control to treatment. Of using the data in one of the subsets of patients to help estimate another subset of patients that gets people really bothered, and that's been done more and more.
I think it's a very similar type of issue and it's sort of striking and we're in trouble if we don't do that. But it has similar statistical ramifications. It.
Well, it also, it, it has tremendous implications towards design because if you can't borrow, suddenly sponsors have an incredible incentive to claim that their patient population is homogeneous so that they can pull everything together and maximize their power.
Yeah.
you say, you know, as soon as you say there's two different groups of people, you have to run twice as many patients, no one's gonna wanna explore those kind of questions.
yeah, so it's the same bizarre thing that happens, that we run trials that are single arm trials where we use a historical control a hundred percent, or we don't use it at all. And you're describing exactly what happens in trials. We run a trial and you do a common estimate across the entire population, a single estimate, which is this, which is full pooling, it's exact same mathematically of pooling populations altogether to get a single answer.
Or we run separate trials and we call 'em population A and population B, but we only estimate those individually in A and B. Now we want to do something that's a middling thing between that, and by the way, we actually get to observe in the experiment similarity of effect that we shrink estimates statistically, uh, that isn't full pooling and it isn't separate. And that people struggle with that.
It's either you have to do all pooling or completely separate populations, and it's just, it, it, it strikes me as just bizarre.
So let me ask you about one other thing, which I think is a real oddball in this conversation. Um, there is a situation where we routinely combine data across different studies, and it's often viewed as the pinnacle of evidence for clinical trials, which is a
I, uh, I have no idea. I have no, oh, meta-analysis. Okay. Yep.
where do you think meta-analysis fits in, in terms of the, the assumptions and how it fits into this debate?
Yeah, I, I, I think meta-analyses are incredibly valuable. Um, the modeling makes a ton of sense and incredibly important. Um, I, I think people overweight single trials and the, the totality of evidence. Now, not every meta-analysis is every meta-analysis. Uh, there's, and, and, and I think people, I sort of think meta-analyses are bad because there are some that are clearly biased. There's publication bias, there's, there's availability of data bias.
Um, uh, so I. Not every meta-analysis is bringing quality data that's free of these issues, but there are many circumstances where we have data on essentially every patient that's been experimented on. And combining this together I think is hugely valuable. I By, by the way, I think the idea that we run two separate independent phase three trials and we analyze them separately is stupid. I, I, I mean, it's stupid that we would analyze them completely separately, and we need both to be 0.05.
Combining them together into a single inference is better in every way.
is two is two 0.49s as good as a 0.051 and a 0.001 on the P-values. Clearly you'd want the latter.
Yeah. Yeah. And, and, and so I think of a meta-analysis the same way I, I, I would do a meta-analysis of my two phase three trials for what, what does it tell me to the answer? So it, you know, those techniques, they tend to be Bayesian. But, uh, largely the availability of data. And there are some where we know there's missing data, we know there's issues with it, and we would not take those at. Uh, you know, at, at a high scientific degree to that.
Um, and that's where the science parts comes into it. And by the way, these are abused where people present data, it's bias data. They're ignoring huge amounts and they call it a meta-analysis. And we think that's bad science. So we've got to, at some level, be able to judge. Good science, quality data, meta-analysis, which I think are more valuable than individual trials there. And then there's bad science put together that we, we, we recognize we wouldn't use that information.
We would certainly say that on any kind of borrowing, bringing together data that doesn't belong is bad. You can borrow in a way that does generate horrible biases and will generate bad conclusions. Uh, but we also are talking about what can be done well with experience. Um, we're running outta time. Why don't you close us out?
Uh, so, uh, I, I think this by the way, is the future. I think as we learn more and more about diseases, we get more and more subsets of them. Um, we are getting more and more quality data available from other clinical trials. The sharing of data from clinical trials, the sharing of other types of resources, we're gonna have explosion with medical data. The idea that we don't use that in our new experiments, if we don't use that in valuable ways, we're running worse experiments.
So a wonderful topic, and I think it's certainly the future. Uh, and so here we are in the interim. So, uh, till next time, appreciate it. Thank you, Kurt.
