Welcome to Berry's In the Interim podcast, where we explore the cutting edge of innovative clinical trial design for the pharmaceutical and medical industries, and so much more. Let's dive in.
All right. Welcome everybody. Back to, in the Interim, a podcast where we explore clinical trial science, the innovations in it, uh, usually a statistics. Ben to this. I'm Scott Berry, your host, and I'm joined by, uh, of course, none other than Dr. Don Barry. Uh. Today we are gonna talk. Uh, he's waving for those of you on the podcast, he's waving. Uh, some of you get, get visual. Um, we, we actually had people, and by the way, I'd love to hear from, from the audience requests.
Things you'd like to hear about. We had requests to go back to some trials and talk about, uh, issues that came up, barriers, how you overcame them. A little bit of maybe, um, the sausage making, if you will. So Don and I are going to revisit. Um, in older trial, uh, talk about the design, talk about the building of it, the different aspects of the trial design, a number of barriers, uh, in it. So this trial is. Uh, a fascinating story.
It's, it's, the name of the trial is the Award five trial, but the, the trial was for a drug on Eli Lilly. All of this is public. Uh, dulaglutide was the name of the drug while we were working on it. It's a Glip one inhibitor
No, it's a Glip one agonist.
clip one agonist. Um, and they were interested in running a phase two trial. Uh, initially they came to us and were interested in a four dose phase two trial for the treatment of diabetes. And the, the drug, the, the efficacy of the drug was going to be h hba one c. I'll sort of jump to the end of this and we'll come back and talk about the different aspects of the trial. We ended up helping them with the team build a seamless 2-3 trial that started with seven doses of Dulaglutide.
It had an active comparator sitagliptin and it had placebo. So it was a nine arm trial. The trial had a phase two component, with adaptive randomization favoring the doses of Dulaglutide that were doing better, and we'll say a lot more about what better means.
It's Dulaglutide
Dula Glide. Okay,
time.
we can jump to its trade name perhaps, which is now Trulicity. Uh, is the Nate trade name of the drug, which, uh, gives a little bit of way of, of whether the trial was successful. It had a component where it did interim analysis every two weeks. During the course of the trial, during the first part of the trial, this was for response. Adaptive randomization over the doses of Trulicity.
At when it enrolled 200 patients in between 200 and 400, it could shift seamlessly to phase three and if it met certain criteria on the efficacy of, HbA1c but also additional parameters we'll talk about, it could move to. The phase three components, which was fixed, randomized, which could be one or two doses, uh, with sitagliptin, the active comparator and placebo, that size of phase. The the, the phase three component.
The second stage of this trial was adaptively selected, and that represented a phase three trial. The patients from both com, both parts of the trial would go into the primary analysis at the end of phase three. We controlled type one error as part of that. That was part of the trial design. The trial was in, was run entirely by Bayesian algorithms. The final analysis at the end of the trial would be a frequentist analysis, an ANCOVA analysis of HbA1c So what happened?
The trial at the first time, it could possibly go to phase three. It did. It selected two doses, the 0.75 milligram and the 1.5 milligram doses moved to phase three, enrolled phase three. At the same time, it spawned additional phase three trials with the doses that were selected. The trial, the trial and the additional phase three trials ran out. the, it ended up, it was a non-inferiority trial to the active comparator. It ended up showing superiority.
It was very effective in weight loss, not supply, not, not surprisingly. Um, the drug ended up getting approved by the FDA in 2014 and in 2024 it was a $6 billion selling drug. and it's been a multi-billion dollar selling drug since the trial. Eli Lilly talked about saving 12 to 18 months of development time with the seamless phase 2-3 trial. So that I'm gonna pause and throw it to Don to add to any compartments. So that's the trial design.
This really, quite, elegant, reasonably complex phase 2-3 trial with interim analysis every two weeks in the trial resulting in a successful trial. And we thought we would go back and revisit the trial in different, aspects of the building of this trial and what happened. So Don, anything to add to the overall structure of the trial?
Yeah, I, I, and I, I'm going to say one thing and then pass it to you for the other. Um, and, and the, the one thing that I want to pass to you to say something about the CUI, the clinical utility index and how that came about. Um, or maybe they're reserving that for the later time point
I, I was reserving that, so we'll come to
And are you also reserving the uh, DSMB? Because the, you know, the picking the doses is, there's, there are lots of nice, interesting stories associated with that.
yep. We'll come to that as well.
okay. So I have nothing more to add.
Okay. Okay. Alright. So we, um, uh, several aspects. So what were the barriers of this? We originally, by the way this, this took place, we were building this. I went back and looked 2007, 2008 ish. Uh, we were building, so this predated even the original FDA draft guidance on adaptive designs. Uh, this was early stages of.
You know, in complex innovative designs, uh, within the setting, we, the original trial design was, uh, meant to be a short term, four months, four doses, 50 patients on each dose. Phase two trial, let that trial read out and then make development decisions.
Endpoint A1C.
The endpoint hba one C That's right. So as we started to look at the potential for a seamless two, three trial design, that that trial could immediately move to phase three, several. The initial parts that were really interesting about this were by doing that and including those patients in phase three, they could enroll perhaps more patients in phase two potentially, but also more doses. So the issue became could we explore more doses a a, as part of the phase two trial?
But in order to do that, we couldn't necessarily do fixed sample size on all the doses because then it grows linearly and now it's, you know, seven fourths is big. If we're gonna go to more doses. So the discussion of the number of doses in that tied to response, adaptive randomization, the trial ended up we started to explore, uh, using response adaptive randomization. The ent, the eventual trial did fixed randomization for only the first 50 patients.
But then did response adaptive randomization at that point. The one of the problems, or one of the issues with that is you don't just want to do one uh, of those analyses because you want to continually learn as the datas was getting more and more rich, and so now we had to do multiple. Uh, analyses of this, which eventually grew to doing analyses every two weeks during the course of the trial,
And the definition of response became much more complicated. Right.
right? So we started to simulate this trial with hba one C, and one of their concerns was that the higher doses. Could end up having really good hba one C effects, but at the same time could have potential safety issues. And so it couldn't, the algorithm couldn't only be H hba one C, so at that point it had to incorporate other endpoints into the trial, and it became clear that. Blood pressure and heart rate, and regulators wanted those.
Had to be not elevated too high for risk of cardiovascular events. So this became a big question in the design is how do you do response Adaptive randomization? How do you do dose selection when you're worried about those endpoints? And later on down the story, weight loss became a really important part to that. We maybe we can come to that. So that became sort of barrier number one, is how to incorporate these multiple endpoints in the design parameters.
I don't know if you remember how this started. But one of the, the really interesting things that happened is we came into this and they were very against a clinical utility index initially. That they had tried to do this in some other settings, and they, they, they were against it. So we started creating a number of rules that we'll take hba one C, but if the posterior probability of, uh, heart rate increase was above some level, we would stop randomization to that.
But if this was true, we would do this. And if this was true, we would do that. And we ended up creating, um, something. And I, I, I learned the terminology at the time, a Rube Goldberg machine, and I think you referred to the design at one point as that. Do you remember that?
I do.
What, first of all, what's a Rube Goldberg machine?
Uh, it's a very complicated machine.
Yeah.
It's, um, you mentioned Post-it notes before. Uh, to me, uh, it's a very complicated machine that essentially does nothing. Uh, it just, uh, machinate.
So, so he created these huge things where a ball would drop, it would hit something else, a lever would fly off. 47 things would happen, and at the end it squeezed it. Uh, toothpaste or something. Um, and, and our design sort of became that, uh, aspect of it. And I remember you and I talking that we, we would show simulations in every, and the, and the issue was.
They would bring a new scenario and it wouldn't necessarily do what they wanted it to do because of the complexity of all these rules that were kind of custom to every scenario we got. We'd create a new rule, but every time we got a new one, it wasn't doing the right thing. So you and I were reviewing this. We were talking with the statisticians. By the way, Brenda Gatos said Eli Lilly was a huge hero in this. In her work and the, the work and the design.
So you and I may talk about our, our discussions, but, uh, the, the team at Lilly was fantastic in, in all of this.
Just a shout out to Mary Jane Geiger, who was an MD. I mean, one of the barriers in these, in that we're gonna talk about are MDs and working with MDs and getting them, you know, they're used to, they, they know how to run clinical trials 'cause they've done a gazillion, um, and to say, well, have you thought about this or this? Uh, so Mary Jane was, uh, uh, also critical with Brenda.
Yeah.
And the FDA was critical. You know, we couldn't have done this without the FDA
Yeah.
uh, uh, uh, we, they had some input into the clinical utility index that we're gonna talk about and we should mention that.
Yeah. And, and they had given feedback that they were concerned. So we, we talked to the FDA about potentially, um, seamless phase two, three, that we wouldn't have an end of phase two meeting. Where we showed the dose, but we would show the algorithm to them that would pick the dose and they weighed in that heart rate and blood pressure were really important and gave parameters that they didn't want to see. Heart rate and blood pressure go above certain values.
So we were looking at simulations of a number of these designs, and we saw this as a Rube Goldberg machine and we were really concerned, and you and I thought the only way forward was gonna be to create a clinical utility index to create a weight function of these different values that if heart rate went above a particular value, that dose had to be. You know, pushed down the value of that dose was lower.
So we took the scenarios that they had created for us and what dose they wanted to pick, and we, at this point, we were probably simulating 30 different scenarios over the different endpoints, and we knew what they wanted to pick. And so we started to create mathematical functions that represented their decisions. And we thought this was the only way forward.
So we went and presented them simulations of a new approach to pick the dose, and we showed 'em how it, how it behaved, and how often it picked different doses before we kind of jammed the clinical utility, uh uh, and the function on them. And they loved it. They love the choices of this, and they became somewhat advocates for the utility function.
And so, for example, the utility was zero if the heart rate increase was more than X. And, um, the FDA, the only thing they contributed to this, except for their concern that Scott talked about. Uh, in terms of the nitty gritty of what the, the clinical utility Index was, is they wanted, uh, a decrease in X. They wanted a more conservative, uh, choice. Uh, and that turned out to be germane in, in, in what the doses were. Uh, were, uh, eventually decided.
So the utility function ended up being, on four endpoints, the mean change on HbA1c relative to Sitagliptin. So for example, if it was within the non-inferiority boundary that had a value of say, one. To, to, uh, therapeutic benefit. If it had better values, it had more and more value. So it was a, it was an increasing function as you decreased hba one c up to a particular limit and then it actually leveled off because they worried about too high of that.
And I am sure there's some people, uh, out there that have tried to do this and they're interested, well, how you put these things together? You've got four dimensional space. Um, and are you assuming independence? Are you just multiplying, uh, the individual utilities? Uh, and the answer is, you know, nearly.
Uh, we had a multiplicative, uh, CUI, um, with the exception of one joint distribution or one joint function of two of the end points that were, uh, sort of the, if then, uh, that were related. So, except for that, it was multiplicative.
Yep. Where if the, uh, there was a parameter in the model, which was the, the mean change in blood pressure relative to placebo. The mean change in heart rate relative to placebo, and there was a region where if it was within a, a, a couple x, that you got full value of the hba one C, but as those values increased, it took away from the value of hba one C in a multiplicative way. Both of those parameters, if it got below, above the value, FDA said, we don't want to see it there.
It had zero value, so it wiped out that dose as a value. The fourth endpoint that came into it wasn't really safety and it wasn't the primary efficacy endpoint, but was weight loss, which of course we understand now, uh, plays a critically important role for Glip one agonists, um, in this. But they knew that if the drug increased weight, it was actually one of the, the, the early parts that they worried if it increased weight, nobody would take the drug.
And if it actually decreased weight, which they had hopes for, but you know, in, in many ways, they were wanting to prevent weight gain that had additional value as a, as a therapeutically good dose. So we showed them the utility function and we simulated a bunch of trials and showed 'em the selection and they ended up.
Looking at these and, largely agreeing and then they tinkered with a little bit when they disagreed with the rule, with the outcome of the trial, they changed the utility function as they molded it around what they thought was the the risk benefit. Uh, of these endpoints, and it got to the point where we were simulating these trials and when they, when we showed them results and said, this is what happens in this example trial, they said, oh, I don't know if I'd do that.
I think they'd pick the other dose. They actually got to the point where they, thought they were wrong, and the algorithm was right to the point that they believed this utility function was absolutely the right way to go forward. And they instructed the DSMB eventually on that. So the utility function was then used to do response, adaptive randomization over the one that had the best profile. It drove whether or not a drug would, would graduate and go to the second stage.
Whether it would bring a second dose along into the trial. And the algorithm drove all of this, which was a Bayesian model on those four endpoints, a Bayesian model that calculated the posterior distribution of the utilities. Uh, and it drove, uh, all of those decisions.
So the, the, the, the one other, uh, uh, another really interesting part to this was the longitudinal modeling within this, um, which we described the original trial was maybe four months they were gonna look at, but now the primary endpoint in phase three was gonna be 12 months. Within it.
And so we wanted to make our inferences based on the 12 month results for hba one C. It was six months for, uh, heart rate, blood pressure and weight loss because that was a comparison to placebo, which they could only give for six months. Um, and so we built longitudinal models of the outcomes over time.
That predicted the six month for three of them in the 12 month for hba one c to drive the decision making in the trial became a huge aspect of, of being able to make these decisions continually during the course of the trial.
So tell her about the pharmacologists and our discussion with them and what they said we should be looking at, and we said. Prove it.
Yeah, so originally. We were trying to forecast HBA one C, which is a blood sugar value. They actually expected that it would be better to predict that using fasting blood glucose than using hba one c, say at one month or two months, that you would look at fasting blood glucose. The fabulous thing was they actually had data on previous drugs.
They had done quite a bit of work in diabetes, and so they were able to provide us data on fasting blood glucose, and hba one C, and so we fit the longitudinal models to the previous data and hba one C greatly outperformed fasting blood glucose as a predictor of hba one c. And so we were able to abandon fasting blood glucose and use only the hba one C values during the course of the trial. So, um, we now, this is in the age of si, we were simulating control of type one error with the regulators.
With, um, uh, uh, in the scenario where we're carrying forward, the doses selected from the first part to the second part, we ended up sending them over 300 null scenarios to simulate to go to regulators and working with them to, to approve the seamless phase two, three part of it. Now we made some concessions to the regulators on this where at least 70% of patients had to come from the second part. They felt comfortable in that.
Do you remember what Rob Hemmings said when we came back to them with the updated simulations? He was at EMA at the time.
Yeah, of course. Uh, but I'll, I'll let you say it.
so we, we had gone to them with simulations. We had provided updates and gone back and
And they, and they kept, uh, saying, you know, well, what about this and what about this and what about this? And eventually, uh, and so, uh, they were kept saying no sort of.
and we came back to them and eventually Rob Hemmings said, we've run out of reasons to say no. And they accepted the simulation control of type one error. And this is roundabout 2008. In this scenario.
It wasn't easy in the, uh, FDA on this side of the Atlantic either. Uh, they in their. 2010, uh, guidance in adaptive designs has said, uh, that they had, uh, uh, they were not comfortable with, um, simulations that they said that sim showing type one error control type one error, uh, by simulation is not well understood. That was in the guidance. Um, and eventually, I mean, that, that has.
Has the, the reason you said, you know, the 300, uh, null scenarios is because they of course, were, uh, anal about controlling type one error. But type one error depends on, for example, what the underlying control rate is. It depends on accrual rate. Um. And so if you know, if, as Scott indicated, if the accrual rate is fast, you may not be able to take too much advantage and the type one error is affected by that or if it's too slow. So we had to do lots of accrual rate, uh, simulations.
They became in the next version of the adaptive guidance, which was nine years later, uh, they had learned. Uh, to be more comfortable with simulations and they had learned, uh, how to handle this sort of a scenario. So, for example, we've designed phase three trials, uh, uh, much, you know, since then. Uh, and we don't have to run nearly that number of simulations. We have to make them comfortable and, uh, promise them that we, you know, we'll do post, uh, analyses.
To make them even more comfortable, should that be necessary. Uh, and so that's actually written in the updated adaptive guidance. Just while we're on the FDA, uh, this was at the time the critical path initiative of CER, uh, uh, dates back to 2004. Uh, but you know, this was, uh, uh, Janet Woodcock's baby and, um. 2006, it got somewhat red revised and uh, but this was regarded as part of that critical path initiative.
Uh, even though as Scott indicated it predated the CID the complex innovative design, uh, initiative. And it was probably, I mean, I don't think there's been a CID that is more complicated than the award five trial.
Yeah. Yep. So I, one of the, one of the things that Lily did in this that was tremendous in, I've never seen, uh, another case of this is. of the concerns we had was we have this algorithm doing response, adaptive randomization. It was using longitudinal models, so it needed somewhat mature data to, to, to do a good job at dose selection is that we simulated the trial under a range of scenarios or concern that.
You send this out to operations and they enroll gangbusters, which you can do in diabetes. So we simulated a range of scenarios and we looked at the likelihood of making good decisions, picking the right dose, uh, before it went to phase three. And Lily, they, they created boundaries and they said, we have to stay in these boundaries. And they did not want to enroll more than eight per month, for example, because they knew that the, it was going to make worse decisions.
And so they told operations, you can't go above this number, which I think was a huge part of the success of the trial.
Yeah. And so that made the trial somewhat longer, uh, but nowhere near the savings that you indicated earlier on in terms of, but it, you know, this was taken into account in their calculation about the savings.
Right. So now as you described this, this is a, a reasonably complicated design. It has got clinical utility index, it's got interims every two weeks. Uh, the algorithm upstate, the randomizations, it is got prospective go to stage two rules. So now we start working on the implementation of this, the operations of this, and
along with that, you know, the, the, um. It also picked the sample size for stage two and the sample size it picked when it eventually went was the smallest possible sample size, and it was a minimum number in stage one. It made that decision, um, uh, uh, in the actual trial, uh, as soon as it was allowed to make it. And, uh. At some point I want to tell you about the DSMB and working with the DSMB
Yeah, so let's, let's shift to that. Let's shift to that. So we, they, they created DSMB and we spent a lot of time showing them simulations of the trial designs, explaining the utility index, the decision rules, and Lily told the DSMB, they weren't allowed to change the decision of the algorithm around doses. And, and it was how much they believed in that utility function to be the right dose. They could stop a dose for safety during the course of the trial.
They could stop the whole trial for safety, so that was still a huge part of it. But they knew simulation control of type one error was a big part of the design, and the utility index was a big part of it. And so the DSMB was fully on board with this. So you and I were, uh, advisors to the DSMB and we were unblinded, and if you remember these, these would happen every two weeks on Wednesday, uh, every fortnight on Wednesdays, these reports would come out.
So, uh, a number of really interesting aspects to the DSMB, which have been made public. Uh uh, so yes, jump into that.
So you said they were on board. They were more than on board. Uh, they said, you know, you've told us that we can't change the design. We can't. Drop a, an arm for lack of efficacy. We have to follow the design. But if we were designing, if we were making all of the decisions, we would make exactly the decisions that the algorithm has made and the algorithm, you know, didn't choose the highest doses, uh, even though weight loss was, you know, incredible on the, on the highest loss doses.
Um, and it, it, it toward the end of that period when they could make, they could drop a dose for, uh, safety, uh, they were worried about the highest dose and they understood they were actually tracking things and, you know, uh, how things were going and how close to making a decision to pick this dose over that dose. And so they were, um, worried about the highest dose, and I said, you can drop it now if you don't drop it now.
And next week when we get, you know, the permission to pick the two doses, if you dropped, um, a, a dose that would've been picked, we still go with two doses according to protocol. If you wait until next week and the highest dose is one of the ones that's picked and you decide to drop it, then we continue on only one dose. So they understood that and so they hastily drop the, uh, the highest dose, uh, for safety. And it was, you know, the combination of, of, um, blood pressure and heart rate.
Uh, considerations that, by the way, um, these doses are now part of the marketed. The, the, the marketed doses were the ones selected in these phase three trials, that the other phase three trials were the same ones. And they were happy with that and, you know, we got the right dose. Um, but they, uh, learned over time that heart rate, even though it was increased. It was not that important and didn't have clinical ramifications.
And so the, the marketed doses now include higher doses than, you know, one dose that's higher than the ones even that we considered and the award five trial.
What, what? One of the, one of the great, the interesting things is when they drop the high dose. It had already been zeroed out in the randomization by the RAR, so it
Well, it was zero. It was zeroed out when they were allowed to zero out,
right?
for the final analysis. But it was predicted based on what the utilities were. Uh, you could see that, you know, almost certainly next week it's gonna be the same and they're gonna drop it. It's gonna be dropped.
Yep. Yep. Um. So the, the, and we could see, and it was the, the, the results were amazing that when it reached that 200 minimum, it was gonna jump at that point. So it was one of the designs where the design ran exactly as planned. The algorithm ran, the RAR was updated, the decision was made. The DSMB agreed with each of the decisions in the course of it. And the trial picked 1.5 milligrams and 0.75. And one of the interesting parts is those weren't.
The tho, neither one of those doses were part of the original four dose phase two trial. So at the time, this 1.5 milligram and the 0.75 milligram were, uh, you know, really good doses in terms of this and they may never actually have been used. With the other, uh, uh, with the other phase two trial. So it had a number of aspects sped up development, and by the way, then it jumped to phase three. Those doses moved on and in into the other trials.
And hence going back to the start of the story, uh, it ended up showing superiority to Sitagliptin, won all of its phase three trials, showed benefit in its cardiovascular risk trial, and is now, now Trulicity.
And, and Trulicity is the, is the initial, uh, Glip one agonist you read today about, uh, the, uh, the other diabetes drugs that are Glip one agonists. Um. But the big deal is the weight loss. And, um, the weight loss, by the way, in the award five trial in terms of the doses, was completely predictive of the weight loss in, uh, a later trial that they ran with Trulicity, uh, that showed that the, uh, dose by dose, I mean, it was almost this, uh, you know, uh, matching, uh.
What, what ha what, what we would've predicted from the award five trial. Uh, but it trulicity is not approved. Uh, Lily hasn't looked for it to be approved for weight loss because they have other Glip one agonists that are, you know, next generation that are even better.
Yep. So if you'd like to read more, uh, about this, you can read about the award five trial. There's, there's, uh, paired publications about it in the Journal of Diabetes, science and Technology. Uh, in 2012. Uh, publications of paper before the results came out of this. Uh, you can read those. Uh, want to acknowledge Zach scr, uh, Jenny Chen.
Mary Jane Geiger, who Don mentioned Andy Anderson and Brett Brenda Gatos, who are on those papers as well, doing, uh, uh, huge work and, and uh, uh, working, worked with them on the team. And this was about nine months, uh, that we built this trial and got it ready to execute.
Just one additional aspect of the nine months. Took us nine months, but I started to consult with Eli Lilly in the 1970s and we built Scott and I over time. I don't know if Scott was pretty young in the 1970s, uh, but when he became old enough to go to Indianapolis, he went to India and we developed a really great relationship and we, we couldn't have done this without the support of the statisticians.
Um, all the way back to, uh, Charlie Sampson who set up, uh, biostatistics at, uh, Eli Lilly, and also was a co-founder of the. Muncie meetings for those of you that might have been a a around back then, uh, Muncie, Indiana where, uh, the ball State University is. And, uh, I presented their adaptive designs because of, uh, Charlie and others at Lilly. But we've, we've, and, and they designed, this was pro Prozac days. They designed adaptive trials in Prozac.
Uh, so they, you know, shout out to, to all of them. If you pick a random pharmaceutical company and drop us, um, by parachute on their, onto their campus and see if we can persuade them to do this kind of thing. I'm not sure of the results, but it's not a certainty that we would be able to do it because we don't have the, the trust build up and. And, and the, the great working conditions.
So it was all a great story for us, and it was a great story for Eli Lilly who decided that we gotta simulate every trial and every trial that we run, we're gonna simulate. And I don't know whether they're still doing that, but it, it's not a bad thing. I'll tell you.
Yeah, no, Karen Price is, is there now many other statisticians and they're still simulating. And by the way, they're the largest financial, uh, uh, uh, pharmaceutical company in the world at this point, I think. Alright, well, uh, thank you, Don.
I, I don't think that's true, but they, they are certainly the biggest on Wall Street.
Yeah. Their market cap or something. I,
Yeah. Yeah.
yeah. Yep. That little Indiana company. Alright, well, so thank you Don. We're, we're gonna jump into some, some interesting trials as we go forward. And again, if you have, uh, any ideas for the show, holler at us. Uh, in the meantime, we are in the interim.
Yeah, and if you have questions about what we've said or what we did, uh, please, uh, you know, send an email.
Yep, in the interim. Thank you everybody.
Thank you. Thanks Scott.
Bye.
