false
Catalog
SCCM Resource Library
The Promise of Bayesian Critical Care Trials: Hype ...
The Promise of Bayesian Critical Care Trials: Hype or Hope?
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Hello, and welcome to today's webcast, The Promise of Bayesian Critical Care Trials, Hype or Hope? My name is Siddharth Dugar. I'm an Associate Staff in Respiratory Institute at Cleveland Clinic. I will be moderating today's webcast. A recording of this webcast will be available within five to seven business days in your MyLearning. Again, thank you so much for joining. A few housekeeping items before we get started, there will be a Q&A at the end of the presentation. To submit questions throughout the presentation, type into the question box located on your control panel. Disclaimer, please note the disclaimer stating that the content to follow is for education purposes only. And now I would like to introduce our speaker for today. It's Michael O. Herhey. He's an Assistant Professor of Epidemiology and Medicine at the University of Pennsylvania in Philadelphia. And now I will turn things over to Dr. Herhey. Hi. Good afternoon, everyone. Thanks so much for joining us. So this is the first of a new series that our methodology section from STCM will be putting on over the next couple of months, trying to make new methodologies and new study designs that you are going to be reading in the literature or may already have more accessible. So I'll jump in in a moment. I want to note a few disclosures that I don't believe are directly relevant today, but I do consult and I have personal fees for trial consulting, some related to Bayesian trials, but nothing specific to what I'll be speaking about today. So the rationale behind this webcast is that if you haven't seen it already, you probably will see it very soon. There is this kind of surging more recently of Bayesian analyses, and they come in two different flavors. There are many papers that are increasingly being done and have been done to try to reanalyze a trial that doesn't hit that p-value of 0.05 in a Bayesian framework and trying to think about maybe we're missing efficacy signals. And then there's a growing number of national and international consortiums. The most familiar one you may be heard of is RemapCAP that are doing fully Bayesian trials. So sometime sooner or later, the way you start to just digest and be exposed to evidence in the critical care community, but also the medical community more broadly, it's going to be Bayesian. And I'm actually very excited about this. So when I say hype or promise or hype, I think there's a lot of promise. And I think it's particularly promising in the critical care community because we just haven't had an enormous success of our critical care trials for several decades. That said, it's very difficult to kind of figure out how to interpret a Bayesian analysis. And one of the leading reasons is that it's just a handful of new words to people that are not really familiar. They're not even really straightforward interpretation because they were derived by scholars in England some time ago, a couple hundred years. And that makes kind of picking up an article and processing it challenging. So my hope with today's webcast, and it's going to be as much as one can kind of do in 45 minutes, is without equations, review the conceptual and empirical ideas behind Bayesian statistics. So no math. What I want you to do is kind of get comfortable with Bayesian thinking. What is the way behind Bayesian thinking? So there is slight differences in how we generate data, but really where Bayesian thinking differs from what you've heard of a frequentist thinking is just how we think about what we have in front of us and think about how we can make decisions from that data. And it's just using a different heuristic than the p.05 and confidence intervals, but not as different as you may think. So in doing that, I will compare and contrast what I think are the advantages and disadvantages, release the benefits and challenges of the two. And it's not really meant to pit one against the other, but as you'll see, I'm a Bayesian and it's something I'm going to try to convince you that you should be too. So before I jump into technicalities, I want to give you a little bit of history, kind of how we got here. So Bayesian statistics has been around for quite some time. And for whatever reason, during the industrial revolution in England, when you may have heard of students' t-test that was derived at the Guinness factory, frequentist statistics just kind of won out and has been really the mainstay of how we make evaluations and assessments of evidence in front of us. And that was really normalized outside of medicine for a very long time and then increasingly in medicine for the last maybe 100 years. And then it hit critical care kind of immediately and very heavily at the ATS meeting in 2018. You may recall that's when the ECMO trial was released. And what you see here are two very divergent survival curves with a 10% mortality difference over the first 60 days, even though many people were transferred from the control group to the ECMO group for rescue therapy. But it had a p-value of p.09. And it comes to this conclusion that you see very often is that the 60-day mortality was not significantly lower with ECMO than a strategy of conventional mechanical ventilation. And it causes uproar. And it caused an uproar among the community and at the conference because of two real straightforward things. When you look at that survival curve and you look at that delta, that 10% absolute thing, that change in mortality, which we see so rarely in a large critical care trial, that effect is large and hard to refute, p-value or no p-value. We also know that ECMO works, so it kind of aligns with our prior beliefs and our knowledge about the intervention we're testing. And if you're kind of with me there, you're already kind of a Bayesian, and I'll kind of help you think about how you can become more over the rest of the talk. This was followed not too long after by another trial looking at resuscitation therapy, dendrominous shock, which also had a pretty dramatic mortality benefit, an 8.5 decline in mortality in 28 days, again, p-value 0.06. And the community started talking, like, are we going to do all these negative trials or are we actually just missing signals? And then that was followed not long after by a reanalysis of the ECMO trial in JAMA, which came to these conclusions that no matter how we really cut the data, there's really compelling evidence, if we look at it in a Bayesian framework, that there's some benefit there. And this was followed by a series of editorials, and one was written by Roger Lewis and Derek Angus, which is on the right side of the screen. And it comes with this argument that clinicians and researchers should no longer be asking the question, does ECMO work? Because we've seen that, and that's what the Bayesian trial has shown us, and we kind of knew that, and that's why we advanced it to a phase three trial. And ideally, what we should be moving to is, how much does ECMO work, and in whom, and at what cost? And this was followed by some of these papers you may have seen, which was another Bayesian reanalysis of a dendrominous shock. And there's been several Bayesian reanalysis of several COVID trials to start a KI trial, and this is just a little bit of a sprinkle of them. There's probably two or three dozen at this point that are out there in the literature. And that's what you're going to be seeing kind of consistently, I think, for the next year or two, as people tend to morph over to this way of thinking. But more specifically, there's actually fully Bayesian trials. So you may have heard of the iSpy trials, and up here on the screen, I have the results from RemapCap, which has tested multiple different domains, antivirals, immunomodulation. Surely you've seen them in one of the major medical journals. And they're a fully adaptive Bayesian trial, and they're making their decisions about efficacy based on Bayesian thinking and statistics and methodology. So what I'd like to do today is introduce you to all that and how they're all working. And I'm going to do that in a series of five, I guess, plays, or five acts. So I'm going to start off and try to set the stage for why it's worth at least considering that we should be more Bayesian. And then I'm going to do a series of three kind of passes through Bayesian. The first one is going to be hypothetical and then an applied example where I just try to introduce you to the terminology and high-level thinking. Then I'll take a second pass where I walk through a real Bayesian reanalysis of a frequentist trial. And as you start to see the distributions and start to hear the words, I will introduce you to how a fully Bayesian trial will work and how it kind of gets us to everything we want to know more. And then I'll give you some closing thoughts and I'll field questions. And if anybody has additional questions and would like to reach out, please feel free to email me. So the motivation. If you're in this talk, you've probably heard RCTs are everything we want to do in medicine. They're the gold standard. And that's because if they're well-designed and executed, the randomized experiment gives us the best evidence that we can get about intervention's effect on an outcome or outcomes. And we talk about this in causal inference as this concept of a causal identification. If we randomize really well, meaning that we take a group of people that on average have a bunch of similarities and they're not going to be as similar as these eight pictograms that are identical, but you can imagine you have a bunch of mechanically ventilated patients or a bunch of sepsis patients and you put them into two different groups. At baseline, their expected mortality, their expected length of stay, their expected outcomes are the same, except one of them is going to have something changed to them. And that's the intervention. The other one is not. So there's an idea that if you do that and you have a well-designed study, you can say, okay, X causes Y. But, I'm sorry, in addition, trials are expensive. They're hard to do. And they take a long time. And unfortunately, we just don't have a great history of them in critical care. So this is an older review from 2014. I've not seen one updated, but it comes to the conclusion that there's only really two interventions and that's prone positioning and low tidal volumes. And if you look at the multi-society guidelines from ATS and ERJ and other places, they tend to only promote these two interventions. And I know that there's been some more recent trials of COVID, but fundamentally, this is kind of where we're at with ARDS still. With sepsis, there's really no definitive trial. I know there's been some more recent evidence with steroids that's promising, but if you kind of look at the evidence base and the history, for about 30 to 40 years, we've been testing therapies that just haven't really emerged as persistently effective across multiple clinical trials. So that begs the question, are we doing our trials right or wrong? Which begs the question, what is the real goal of a trial? So this is a wonderful paper by David Sackett. If you're kind of interested in this topic, I like to use it as a teaching thing, a teaching tool. This nice paper talks about what do clinicians want when they bump into a new treatment? And he writes that they ask themselves two questions. First, is this intervention or is this therapy superior to what I'm using now? And second, if it's not superior, is it as good, what we would call non-inferior than what we're using now? Or is it preferable for some other reason, fewer side effects, better affordability? And ideally, the trial that we design and execute, especially when you use the big money and time and effort and human resources, would answer all these questions. Unfortunately, the frequentist framework doesn't really allow us to answer all these questions simultaneously. And what I'll show you at the end is that the Bayesian framework does. And that's because frequentist thinking relies on this concept of null hypothesis significance testing. And you've all seen this probably since high school chemistry. So you create a trial and you derive a null hypothesis and you say hypothermia has no effect of out-of-hospital cardiac arrest. So this is one of the examples I'll introduce you to later in the talk. And then you derive this alternative and you say, okay, hypothermia has an effect of a certain magnitude. And I'm going to say 15% absolute decline in mortality or some reduction in ICU free days. Then you run your trial and you compare it to two groups and you just say whether or not it is statistically significant or not. The problem of all this thinking, and at least in my interpretation, in my view, is that null hypotheses are kind of rigid. And if you think about them, they're a little nonsensical because it's really impossible philosophically for two interventions to have exactly the same effect. And we want to know what is the difference between two interventions and p-values don't tell us that either. P-values tell us the probability that our results that we observed in this trial are compatible with our null hypothesis or not, given a fixed sample size. And when that probability is really small, p.05 is what you're used to, we reject the null hypothesis and say, okay, I accept it. But this test really makes no distinction about the new treatment being better or worse or how much better or worse than the new comparator. And I think, and a lot of people who like Bayesian thinking believe that the consequence of this is that trials are not answering what we know. They're leading to binary interpretations of trial. Is there an effect or not? Not how much of an effect and what probability? And the risk of this is that we're potentially discarding and unlikely, but potentially continuing to use some harmful therapies. Okay. So with that kind of setup, let's talk about Bayesian basics. So there's really kind of two things you need to buy in to be Bayesian, or at least understand what Bayesians are doing. The first one is that we're not going to really talk about yes or no. We're going to talk about a probabilistic interpretation of interventions effects. And I'll introduce you to what that means over the next several slides. The other thing is that you're kind of operating a meta-analytic perspective. Now the goal, and this is where the words are going to get a little unfortunate, but bear with me. I'll make them simple. So the ultimate goal of a Bayesian analysis is to create a posterior probability distribution. So that distribution represents two things. It could represent a prior, and I'll talk about priors over the next several slides. So you'll get more and more familiar with them. And then it has new knowledge, which is called likelihood. And the likelihood is just new trial data, the trial you just ran. And what you do mathematically is you push them together and you say, okay, now that I have my old information that I believe I have with my new information, this is the most up-to-date information that we have about an intervention. That's called a posterior probability distribution. And if you don't have good prior data, which is something many people do in a critical care literature and elsewhere, you can just make a probabilistic of your trial, sorry, probabilistic interpretation of the trial. So you're just going to look at the likelihood. And we're going to see this a handful more times, so bear with me, you'll get more familiar with them. So nomenclature, just to reiterate, you're used to seeing the frequentist literature, which is just looking at the new trial, the new experiment, whatever's just been published in JAMA or critical care medicine. That's the likelihood. Bayesian is taking the prior knowledge and it's attaching it to the likelihood, creating a posterior. Once that information is created, that posterior now becomes a new prior, and it's this kind of evolution of constantly learning, which I think is one of the nice attractions. That's really what we do as human beings. Let's walk through them step-by-step. So the ultimate goal is to get to step three, to create that posterior probability distribution. So that's step one, which is having a prior, put it with step two, which is just a new trial data, nothing special, and now you have that distribution. So let's take a little bit of a deeper dive into priors. So if you've heard some criticism about Bayesian analysis, it's probably that it could be gamed by the priors you use. And that is not necessarily an incorrect accusation, but you can also be very careful. And one of the things that I promote and a lot of other people promote are very careful and thoughtful and also justified empirically creations of your prior. So what is a prior? A prior is a best guess about the effect estimate before your trial or before your study. And ideally, especially in a re-analysis of a trial, there should be several different priors, and they can come from multiple sources. So you can do what's called a non-informative prior, which is that green line, and I'm going to walk through each one of them in a minute, which is just saying, actually, I don't know any information. Every possible effect on a continuum of possible effects is possible, so you're not really adding information, and that just sets you up for a probabilistic interpretation. You could have a meta-analysis that just came out and said, we just did 16, an assessment of 16 trials, and that suggests that there's a small benefit in mortality, which is prior two. But what I'd like to see, what I'm going to advocate today, what I would advocate you look for as peer reviewers and readers, is the use of hypothetical priors, which are priors that are meant to represent things that may not be empirically available, but with a range of possibilities. Let's walk through each one of them, and I'll give you a better flavor of what that means. So priors. The first one is the non-informative prior, sometimes called a flat prior or a vague prior, and this simply means, as I just mentioned, is that there is no information you're adding to the trial. You're assuming that every possible intervention effect size is possible, and this allows you just to move into the probabilistic interpretation of the trial. The next one that you usually see is one that suggests benefit. So in this example, there's about a 3.5% benefit in mortality, and that usually comes from a meta-analysis or some type of prior trial, if there's only one or two available, and these are called informative priors. So informative priors tend to be based on empirical data, and they are used to represent what we know about it. Do we know that it's a benefit, so that's an optimistic prior? Then you can create additional priors if your other priors that you have empirically do not actually cover all this continuum. So the next one I like to see is what I call skeptical, neutral, or something that represents equipoise. And what you see a lot in Bayesian analysis, if you're starting to see these figures, are normal distributions. So the idea is that you can normalize a distribution that represents effect sizes, and then it just goes to z-scores. You say how much is to the right that suggests benefit, and how much to the left suggests harm. And what you see here with the orange prior, which is a skeptical prior, 50% of the distribution is to the right or left. So it's not really hedging its bets on any of them. It's saying, I'm going to just say it's a coin flip. There's a chance of some benefit of harm, and there's a chance of some benefit of some potential harm. And then you add what usually is a harm hypothetical prior. And this is a prior that usually doesn't always align with reality if we're doing a reanalysis, because usually we're doing a reanalysis because we believe that something's effective. But the idea here is that you're going to scrutinize the posterior probability distribution that you're combining to get to, and say, OK, how robust are my results if I assume that these trial results are just antithetical to what we've already seen in the literature? To say, I'm going to believe that everything else in the literature is harmful, and then say, OK, what is still my probability of benefit and harm? So once you get all those priors together, you can put them together with the likelihood, and you just reanalyze the trial multiple different times using each different prior. And I'll show you what that looks like formally in a couple of slides. And that's all you need to do. The likelihood is just the new trial data. This is why people like the Bayesian reanalysis of trials so much. And then you're back to where we started and wanted to get to, the posterior probability distribution. So let's start to walk this out into some real examples and looking at real trials. So this is a paper that I wrote a couple of years ago with a couple of colleagues where we outline a way that we believe would be beneficial to reanalyze clinical trials that are analyzed, that were designed in a frequentist framework, but are being reanalyzed in a Bayesian framework. And if you feel like some of these concepts are just easier to be read, I'm really pleased with this paper. So maybe give it a look. We spent a lot of time trying to make it a tool for people who don't like statistics in a clinical audience. So I hope you find it accessible if you pick it up. So we outlined this way that we suggest that we should peer review and expect Bayesian reanalysis to look at. So here's the framework, and then I'll walk you through an actual application. So the first thing we argue is that if you're going to reanalyze a trial, you need to present both of them side by side. Let's look at the frequentist results and let's look at the Bayesian analysis simultaneously. An easiest way to do that is to start with the prior I mentioned a couple of slides ago. So you start with providing the Bayesian estimate of a non-informative prior. So we're not going to change any of the distribution. And then you're going to have a bunch of new information you can get, and I'll show these on the next slide in a formal distribution, but you can start to talk about things like the range of practical equivalence, which is how likely or how suggestive are the results of just basically equivalence. Are they quite similar or are they actually quite different that we should really say that there's some effect there, even if it's not really P.05 in the other framework. We also talk about potentially isolating this concept of severe harm or outstanding benefit, which is how much of this just suggests that this intervention is remarkable or really, really dangerous. And then we suggest going through this prior as I just showed you. So looking at it with one skeptical prior, one pessimistic and harm prior, and usually an optimistic prior, which is easy to get because, as I mentioned, that's usually why people are motivated to do this. So if that's available, we recommend that the optimistic prior is empirically data-informed when possible. And then use a Bayesian meta-analysis to kind of make sense of all this. So what's a little challenging when people pick up a Bayesian paper is that they see multiple different effect estimates and they're like, okay, well, which one do I interpret? So I'll show you how we like to recommend that people interpret them. Okay. So the first example is this trial that was done in Brazil. It was looking at a lung recruitment protocol and it was just on the margins. So the P-value was 0.041, suggesting harm, but if you didn't use a hazard ratio and use an odds ratio relative risk, you would have a P-value right around 0.05. So it's kind of one of those trials that we believe could have just snuck up past that kind of traditional journal conclusion of not significantly different. So this is what you get when you estimate in a frequentist framework. You've all seen a table like this before. You get a point estimate and a P-value, and that's how the conclusions are derived. And I think that's really unfortunate because there's actually so much more information that you can derive when you start to think about things from a Bayesian perspective. So when you do a Bayesian analysis, as I mentioned, you're always operating in kind of this normal distribution. So we take the log values of things, which is the log distribution of potential treatment effects. So what you're looking at here is the estimated odds ratio on the log scale from that trial. So by putting it on the log, we get a normal distribution. As I mentioned, once you have a normal distribution, you just have to look how much of the distributions to the right or left of certain landmarks. And in doing so, you can start to say a lot more about what the effect distribution is actually telling you. So in this example from that paper, this is a posterior probability distribution from a non-informative prior. So it's just a pure probabilistic interpretation of this lung recruitment trial. So let me orient you to some of the things that I find so attractive about thinking about effect estimates and clinical and just clinical information from this distribution. So let me orient you to the red box. That's null. So that's an odds ratio of one around a log scale, so it's zero. So the first thing you can say, okay, how much of that distribution is kind of congruent with an interpretation that there's some benefit from this therapy? And the conclusion is that only 3% of that distribution is congruent with any benefit. So even if it's just a tiny little marginal benefit, just really little is there. Alternatively, to the right in orange, we have 97% of that bell curve. And that suggests that all the evidence that all of this trial really is confirmatory and suggestive of harm. But we can do more. We can think about this region of practical equivalence and say, okay, let's just look at the distribution where the volume of effect size in our distribution that's around that null level. And what you see is actually quite a bit of it is not, and that there is some, but really all the evidence is really aligned with some interpretation of harm. And then we can say even more and say, okay, what about an odds ratio above 1.25? Like how much of that distribution is really suggestive of a negative treatment effect? And we see the 54% of it's there. So this is a little bit of a teaching example. So these are a little bit of extreme distributions and interpretations, but you can see already how much more you get than a p-value. So then you can walk through your analysis and then here, what we have are those point estimates. So those are the point estimates in the credible interval, which I'll introduce in the next example. So I just want to get you familiar with the first set of terminology and then we'll go a little bit deeper. So you have a meta analysis here where we have a meta analysis of all of our priors. So we have our skeptical prior or pessimistic prior and our optimistic prior. And what you can see visually is that the effect estimates really don't change no matter, even if you assume optimistic, that kind of hypothetical and a different state. This trial doesn't really align with anything we know. They're all kind of the same. And the conclusion is there, you can do an I squared and say 11% of the variation between all different prior estimates is attributable to different priors. So in this inclusion, we feel like there's really compelling, indisputable evidence of harm. It doesn't really matter how you cut it. This is what we see. And if you're starting to review papers, look for things like this. Don't be kind of complacent and just say, okay, well they did one analysis and one prior. Try to look at their priors and try to think about how much they're varying. And that will help you kind of process in your own head, how much trust and how much do you believe that they're actually making a compelling argument? Okay. So let me walk you through a formal analysis. So this is one that I did more recently with a group of pediatric intensivists at Penn or at CHOP, I mean. So I'm introducing you to the therapeutic, I'm sorry, therapeutic hypothermia after out of hospital cardiac arrest in children. So the original trial was powered for a 20% absolute improvement and good neurobehavioral outcome at one year. And what you see in the black boxes is what to me is visually quite compelling effect estimates. So with the composite outcome alive, with a good neurobehavioral outcome score measured by the VADS2, we have a risk difference of 7.3. So the 7.3 is you want a higher percent of individuals of alive and a good neurobehavioral score. So it's positive. And then you see a relative likelihood of 1.54, again, suggesting that there's a much higher rate of 7.3 for the outcome, p-value of 0.014. Similarly for mortality, you see a 9.1% increase and a p-value of 0.13. But they're kind of with the notes that this trial was powered for a 20% absolute improvement and a good neurobehavioral outcome. And this goes back to one of my original slides that is this a negative trial or did we miss a signal? So I think this is actually a very relevant case. So one of the criticisms against Bayesian analysis is should it actually be done? Are we just gaming the system? And I do think that that's a fair criticism from time to time. But particularly in pediatric out-of-hospital cardiac arrest science, I think this trial is really relevant. So it's relatively rare, but it's also relatively frequent, depending on how you think about an outcome. So you don't want it to happen to your child. So about 7,000 out-of-hospital cardiac arrests occur in US a year by best epidemiologic estimates. This trial is the only pediatric trial to date that's been published. Adult trials, if you've kind of been watching them, are a bit all over, and also because of different factors such as shockable rhythm and just demographics, are kind of hard to translate to pediatrics. So there's not a good ability to translate evidence. A more recent UK feasibility study concluded that they cannot conduct trials because they just aren't feasible. Not enough people can be recruited reasonably enough. And this is kind of echoed in the fact that a new US-based trial was just started, but it's not projected to end until 2029. So when it comes to randomized evidence for out-of-hospital cardiac arrests, this is it. And this is where we are at with the International Liaison Committee, where they write that based on FAFSA and other evidence, but mostly observational, there is inconclusive evidence to support or refute the use of therapeutic hypothermia. And I think reading that is kind of unfortunate, because that last line to me is really the consequence when we start to only think about things in p-values. So we asked, what would a Bayesian interpretation of this trial conclude? So our trial, our reanalysis is most recently published in February, New England Journal of Medicine Evidence. So, it's available online if you are interested in some of how this looks in application. So, our conclusion was that there's actually a bit of interpretation in this trial, as I kind of hinted at in previous slides, that is suggestive of a benefit. So, here again is that bell curve that we create when we do a Bayesian analysis, and you can see three different shades of blue. The light blue, which is the majority of the distribution, is suggestive of some benefit. We'll talk about how much benefit in a minute. The shaded middle blue is suggestive of some harm, so 94% of the distribution is suggestive of some benefit, 6% could potentially be harm, and that interpretation of severe harm, that kind of high odds ratio is less than 0.01 or 1% of the distribution. So, this is us looking at FAPSA for non-informative prior. We haven't added any information at all, and our conclusion from that is that there's very compelling evidence that this is actually a therapy that at least has some distribution of benefit. So, the question is, how robust are those results? So, what you see here are a ton of priors. So, on the left slide, we tried to create empirical priors. So, there was a couple of trials, the TTM trials, there was a meta-analysis, so that's one distribution. There's Hyperion and then there's Grandfelt. So, those are the adult trials, which, as I mentioned, we don't think are perfect clinical translations in the pediatric care, but we included them just to say this is the evidence we have. And then what you see on the right side is the kind of three distributions that I mentioned, this optimistic, so it starts off at the top of a neutral, and you can see that we're tinkering around with different sizes of the distribution, so how much of the prior is concentrated around a null effect. So, all we're doing is shortening and widening them, and then we do that for optimistic and then we do it for pessimistic. So, we just did a ton of different priors, because we really wanted to scrutinize this trial and say, no matter how we cut it, how much of our conclusion is going to vary from what I showed you on the prior slide based on different interpretations, and I will note that the pessimistic here really doesn't align with anything. Most of the evidence, at least in adult literature, is at least that it's null or has very little effect. So, the pessimistic priors are kind of really extreme hypothetical, so they really are kind of extreme tests to see how robust our results are. So, here are our results. So, I showed you a couple slides ago the distributions, but you can cut up those distributions and you can say different things about them, and these are some of the things you can say about them, in addition to what I've showed you before. So, the first thing you get is what's called a median benefit. So, if a normal distribution were more or less normal, the mean and median are about the same. So, you can think about that as your best guess. That's the peak of the distribution. The peak of the distribution is where the largest mass of likelihood or likely effect sizes is. So, that's our conclusions. That's our best guess. And then you have this credible interval. So, the credible interval, you can think of a little bit conceptually like a confidence interval, but it is different. Let me walk you through how it's different. So, as I mentioned, they're like 95% CIs conceptually, but they're not empirically. So, on our trial, our non-informative median estimate, incredible interval, 6.8, spanning a potential signal of harm of a 1.9 increase in harm or decrease in neurobehavioral outcome up to an increase of 15.4. And if you look at the frequency as confidence interval from the original trial, you see a very similar confidence interval and point estimate, slightly different, but similar enough that you can see they're derived and comparable. But our interpretation of this is 6.8% is our best guess of the impact of therapeutic hypothermia. And then what we say when we talk about credible intervals is we say that our data is aligned with a plausible range of effects that range from a harmful effect of 1.9 or 2% to an increase of 15.4%. Going back to the blue slides, this aligns with a 6% chance of any harm and a 94% chance of any benefit. So, what's really specific here and what's really different and where people don't always feel that comfortable with frequency Bayesian statistics is that zero is in that confidence interval. It really doesn't matter to a Bayesian because we're not doing a test. What we're trying to understand is what's our best guess and what's the plausible range of likely effect estimates and how many align with different likelihoods of an effect size of 2%, 3%, or so on. So our conclusion in a Bayesian speak is that we would say of this interpretation is that a null or harmful effect is possible, but it's very improbable. So again, this is going back to kind of accepting that probabilistic interpretation of effect estimates where I have a trial. After that, you can do more. So you can scrutinize that bell curve as I showed you before, and you can think about it on the absolute scale and say, okay, how much is associated with any benefit, which is what I've been showing you. And you see in the fourth column where I have a greater than equal to zero. That's the probability of any benefit, just that the effect that we've observed is greater than zero. And we see that's 94%. We can keep on walking that out. So we can say, okay, what's the probability that the effect estimate is likely higher than 2% on the absolute scale or 5% or 10% scale. So what you can see with our non-informative prior is that there's fairly compelling evidence that there's at least a modest benefit. There's at least some benefit, and it's likely greater than 2%. But as we get up to 5% there, you see the 66% of the distribution is kind of aligned with that interpretation. Well, you can also look down in that column of greater than zero. So what I've added in the first column here are all those different prior distributions that I've showed you on the slide before. So I just named all the different priors in the figure. And if you go down, you see, okay, with the Grandfelt and TTM trials, we see a 66% chance. So those aren't really aligned and suggest that our trial may be a misnomer. But with Hyperion, with the optimistic trials, and even the neutral priors that I have, all those hypothetical priors, there's always kind of above an 80% to 85% of not higher likelihood that there's some benefit. And obviously that gets smaller as you move away. The only time that our priors really start to suggest that there's less than a 50% chance of a benefit is when we have our extreme pessimistic trials. And as I mentioned, those are trials that are meant to be kind of a strength test, how robust are our results. So our conclusion is here that there's pretty compelling results in the vast majority of ways that we look at it. There's some benefit there. If you like to think about things on the relative scale, which you could also do, which is so just to be a relative risk benefit, just because we're talking about we want to hire a number, you can do the same thing. You can cut up that distribution and say how much of the distribution is associated with a risk benefit of greater than one, 1.1, 1.25, and so on. And just to walk that out again, you see very similar results. All of our kind of interpretations really aligned that there's some small or modest benefit. And our best guess is that's about 6% to 8%, 6% to 7% on the absolute scale. We're about greater than one, but not greater than 1.1 on the relative scale. So kind of the tail to statistical philosophies is that New England Journal of Medicine trial, we see this conclusion where we say therapeutic hypothermia compared to therapeutic normothermia did not confer a significant benefit on survival. And that's not really telling us anything clinically. But by doing all those prior distributions, our interpretation of the trial is that the probability benefit was about 94% for both outcomes, both the composite and then survival alone in the non-informative prior. So that's what our data tells us. If we tinker around and say, okay, how much does this align with all these different hypothetical and empirical distributions that are out there, we say the probability of benefit is greater than 75% for all possible interpretations. So our conclusion is that there's a high probability that hypothermia improves neurobehavioral outcome and survival at one year, even if modestly. So just a different lens, different way of seeing the same data. So now kind of taking everything that we've seen today and all those distributions, let's talk about what a truly fully Bayesian trial looks like. And this is an example from RemapCap, which is an international consortium, which has done a ton of trials that I showed you in one of the opening slides. So if you kind of follow this, you can interpret a lot of their trials that you're going to see coming out in JAMA, and they're doing a lot of stuff in pneumonia and other things. So all their trials are built relatively similarly, if not the same. So this will help you walk through any of them. So here's the results and conclusion. And I think you'll start to see where everything I showed you in a prior slide starts to come together to interpret a trial in a fully Bayesian frame. So the results of this trial are relatively null, and it was stopped early. And they say the median for organ support-free days was seven in both groups, and that rejusted in a credible interview that ranged from 0.86 to 1.23. And this distribution, that probability distribution, had a 95% chance of futility. I'm going to show you what that means on the next slide. So they created a trial to try to test for superiority. When you operate in the Bayesian framework, you're simply looking for different thresholds being met across that bell curve that I showed you. And they come to the conclusion that compared to antiplatelet therapy, this agent is not effective. And what's beautiful about this is that their trial was defined to go back to solve all the things we want to know when I talked about that ideal trial. So in a Bayesian trial, you don't necessarily focus on just superiority or futility or equivalence. What you do is you start to enroll individuals. And every certain number of individuals, you look at that posterior distribution that you're deriving. And most of these trials are designed with a vague prior, with a flat non-informative prior. So all they're doing is trying to create a posterior distribution, and they're looking for that posterior distribution to become stable. So as you know, if you recall, the more people you get into a sample, the more stable your bell curve comes. So after 50 people, they'll look. And after 100 people, they look. And as they start to see that distribution becomes stable, they can start to make decisions and say, OK, we have 200 people in the trial. Do we can stop at this trial and say, is there a chance of efficacy? So efficacy would be where 99% of that bell curve is to the right suggesting benefit. Futility would be guess. I'll show you on the next slide visually. Futility would be that 95% of the distribution is less than 1.2 compared to control. You can also test for equivalence and say, are we still in a state of equipoise? Are the distributions just not separating enough that we're still in that region of practical equivalence that I mentioned before? So what does this look like visually? So here are four or three more different bell curves. So again, this is kind of what I've been showing you this whole talk. You get the sampling distribution. The peak of that sampling distribution is your best guess. And then you have your 95% credible intervals. And what you see here are the three panels. So the outcome here is organ failure free days. So you want more of them again. So they would conclude, as you see visually here, that there is some benefit on the log-odd ratio scale above 99%. So if you go back to the distributions I showed you, their conclusion is coming that we have efficacy if 99% of our distribution is aligned to some benefit or any benefit, not really focusing on what benefit, where greater than 2% or greater than 3%, they're just focusing on any benefit. If the distribution in the second panel is kind of to the left and it's underneath it, it's not suggesting that there's going to be a real separation, they're going to come to futility and say, we can continue this trial, but the distribution is relatively stable. We're not seeing a lot of change, so continuing is futile. And then at the bottom, you see kind of that perfectly skeptical distribution I showed you a couple of slides ago. You got 50% of the results are to the left, suggesting harm, and 50% of them are to the right, suggesting benefits. So you're still in a state of equipoise, still in a state of equivalence. And these are how Bayesian trials work. They progress, they keep on observing as the distribution starts to stabilize based on predicted examination numbers, so like 200, 300 individuals. These are why Bayesian trials are often much smaller than people think, is that once they get a stable distribution, they just have to come to one of their decision nodes, and you can get all the information about a therapy from one trial, which to me is, I think, kind of the beauty. As you're enrolling less people, so you're not exposing an increased number of individuals to risk unnecessarily because you made some frequentist prediction and say, I need X amount of people on my trial. As soon as you get to a decision node, you can say, okay, it's time to stop, we have enough evidence. And in a fully adaptive Bayesian trial, if you're starting to see them, you can move on and test the next intervention. So it allows a trial to progress more efficiently and quickly. All right, some closing thoughts, and then I'll field questions. So as I mentioned, my view about this is that trials are very difficult and hard to do, and I believe that when we conduct one, and especially when we enroll a human participant, we should get the most out of them. Historically, though, it's changing a little bit, and I hope that this talk helps it change a little bit more. We live in a very p-value centric world, and I think that's a little wasteful, because I think that really, if nothing else, kind of ignores what we've already learned from past trials. And this goes back to my concept of this meta-analytic perspective. It also seems wasteful because we make binary decisions, and that's not really what we do as humans. And when you're a clinician, you're always evaluating multiple different things. You're saying, okay, what's the cost-benefit here? These are the different factors I need to consider. And yes, you immediately arrive to a binary, or you eventually arrive to a decision node, but I don't think that empirical evidence that we create necessarily should operate so harshly on that approach. It would be cynical for me to say that this is not necessarily required to p-values. There are alternative ways to interpret p-values. There's things called p-value functions, and it's not necessarily different than Bayesian. As I showed you at the Bayesian trial, they are making stopping rules when 99% of the distribution is aligned with this, or so on. So there is always going to be some threshold for acceptance, and how we kind of move on and say, okay, this therapy is efficient or not, or equivalent or futile. So that's not necessarily a fair criticism of p-values in isolation. But what I think is worth thinking is that trials generate evidence, and this evidence is consumed by clinicians and practitioners and different individuals who operate and deliver care. And eventually, they make themselves into practice guidelines and guidance, and then things that we teach people about practicing medicine, and this is essentially how we kind of accumulate and learn. And what I really like about Bayesian thinking, and I hope I kind of convinced you today too, is that Bayesian thinking and Bayesian interpretations of the data we generate really directly address the primary question raised by individuals. That's because these Bayesian posterior probabilities that I've introduced you to really maximize information by telling us what are the real benefits of harm, and let's look at a bunch of different ways of cutting up the pie. And that allows us to think more about the treatment based on probabilities. And of course, we have to incorporate this with additional considerations, such as other risks, costs, prognosis, patient or family preferences, community views. But this is really what we do as a community, like ATS guidelines, SCCM guidelines, when we get together as multi-societies. And this is what I feel like this is evidence we need to kind of push those guidelines to be as evidence-based as possible. And this is to me where I think focusing and continuing to focus on p-values and QE.05 to interpret the robustness or lack of effect just starts to become nonsensical. So that's it for me. I just want to acknowledge funding that influences the work and a lot of what I do and work on and have said today, but without any attribution, if you didn't like any of it or I said it wrong, it comes from a handful of colleagues who have also kind of jumped on this bandwagon with me. And you'll probably see them in the literature, publishing their own Bayesian analysis. But thank you so much for listening. And thanks so much for coming to a talk with Bayesian in the title. And please feel free to email if anybody has questions. Yeah, thank you so much, Michael, for that excellent lecture. We have a couple of questions in the question box, but if anybody has any other questions, please do put it in the question box and we will try to go over them. So one of the first questions, how are flat priors and skeptical priors different? It's just a nomenclature thing. They're formally the same. And I'm sorry, because I use them a little interchangeable. But essentially what it means is that there are slight variations just to be technically correct. But essentially a flat or non-informative word bag prior all means that you're not hedging your bet on any one assumed effect size. You're saying that basically the infinity of possible effect sizes is possible. Sometimes they're a little bit heavier right around the null area. So people derive them a little bit differently. But conceptually, the goal is to not influence the trial data. And one of the things that I advocate is that that's the best way to first look at your trial data, because the first attack on any prior is that you've gamed it. So those are really there to not allow that interpretation that you're gaming your prior. The next question is, can Bayesian analysis be applied to observational or retrospective data? Yeah, absolutely. More than anything, I think it's just a philosophy. I think it's a way of interpreting the data you have in front of you. In a fully Bayesian trial versus you can't really have a fully Bayesian observational study, but you're generating evidence with Bayesian stopping rules. So it's a little bit harder to kind of go opposite, but really it's just there's no functional difference to the data that exists in front of you. It's just how you're going to interpret it and talk about the likelihood of effects. And I think more than anything, how you decide what constitutes compelling evidence or not. Next one. So coming back to a question that we discussed prior to this meeting was, how do you calculate sample size or power in a Bayesian analysis? It's much more difficult than your kind of state of recess or our plugin. What you are trying to do when you design a Bayesian trial is you're trying to figure out how many individuals do you need for that posterior to become stable? And what I mean by stable is how many individuals do you need such that when you add another 15 or 20, it's unlikely to start to influence your interpretation of the trial. And this is relevant for a couple of reasons. So imagine you have that stopping role of 99% or 95%. And you get there, but you still have a bunch of people on follow-up. And once those people finish follow-up, you decide to stop your trial, but then that shifts the posterior back down to 98% or 97%. So there's a concern that you could potentially do early stopping based on hitting decision node because of some lag in the data. So what people do, which is really time intensive, is you do a ton of these, what are called statistical simulations, where you create all these different scenarios and hypotheticals. And you try to figure out what is the kind of balance between follow-up and number of individuals that make that Bell distribution stable. And it is a little bit of an imprecise science. And I think anybody who does it will acknowledge their challenges. But usually, you unfortunately have a lot of individual or a lot of statistician doing a lot of different statistical simulations. And that is what makes some of the movement to, I think, adaptive Bayesian trials specifically challenging is that you really need a good statistical team that's willing to really go through that. And I've been involved in some different levels of support from committees and steering committees of a bunch of the international consortiums that are launching. And they have spent months just going through simulations, trying to figure out how they're going to write their statistical analysis plan. So on this front, the trade-off is considerable. But if you believe, and this is what I believe, honestly, is that you're going to have to enroll fewer patients and you get more out of your trial, that there's a little bit of a benefit. There's a cost benefit trade-off there. But there's unfortunately what you have mostly been exposed to if you thought about power, what people call closed form equations, like you can just put in a delta and some assumption about standard deviation and then you can get a power of this. Unfortunately, there's not really that type of approach to doing statistical power in a Bayesian trial. If my understanding was right, it's basically you start a trial and then you keep looking at the bell cup distribution and you start approaching one of the pre-specified criterias. You run some more simulations on the patient that you are going to follow up to see if those patient information, whatever the information may be, will influence the interpretation of the results. Exactly. And you kind of make a best guess and say, our first stopping time. So if you read a Bayesian trial, you'll see usually something like, we look for, we look to potentially implement our stopping rules at a sample size of 200, 300, and 500. So what is underlying there is that they've done simulations. It suggests that they should have enough information to make a conclusion there. And if they don't, they're going to keep on continuing the trial until they feel like they've gotten to a point of futility, equivalence, or superiority. Excellent. Are there unique considerations when using Bayesian analysis results to inform cost-effectiveness studies? Great question. That's an interesting question. I don't think I know enough to feel comfortable shooting from the hip. I've seen papers in the stats literature with Bayesian cost-effectiveness, but I don't know enough of how they work. So I'm sorry not to be able to answer that one. I just, I don't want to just make stuff up. I don't believe I know enough about cost-effectiveness analysis, but if you want to email me, I can share some papers and maybe try to do a little more homework, but sorry to pass on that one. So the question is, can this type of analysis be applied to mechanistic studies in, for example, animals? Can you use this approach to ask about the probability that a specific biochemical, I think it was reaction, contributes to an observed effect? Yeah, I do. Actually what's unique about historically, and I skimmed this a little bit, where you do actually see a lot of Bayesian trials, and actually for some time, and I don't know a ton about Bayesian, about animal trials, it's just not really a field of research I'm exposed to, but a lot of phase one and phase two trials, especially in humans, operate on Bayesian. So that's when you're trying to be really careful about efficacy, and really I mean safety and dosing, and knowing that a lot of those studies use pharmacokinetic endpoints and different biomarker endpoints, I'm pretty confident in saying that there is a translation there. I just have not seen it directly of a Bayesian, directly of an animal study, but if you have pharmacokinetic endpoints, and you're trying to use it as, and you want to progress, and there's another thing that are called combined phase two, phase three trials. So it applies phase one, phase two trials. So they will essentially, it's an adaptive trial like I just showed you. So you make those distributions as decision nodes, and then once you hit one of them, you'll drop your control arm and say, okay, the intervention works, and then you'll add a new intervention arm, and then it kind of, you can keep on testing progressively. So there's definitely, there's definitely methodology out there that's at least translatable to that setting. And I think like we have one more question, so we are perfect on time. The stoppage rules for efficacy or futility seems arbitrary. Wouldn't this be similar to p-value argument for significance? And I will add something from my side to one of the things that we see in critical care research is that the sample, the difference that we calculated between placebo and the intervention usually take, like gets to us to a sample size when we studied in real patients, we find that the difference was not that much, and that's where most of the trials fail. Aren't we going, that's where we saw the failure of the frequent, the prior critical care research. Are we going to suffer with that in Bayesian study as well? So my favorite topic is why critical care trials fail. So that's, I'll try not to give a loaded question, reloaded answer. I believe there are numerous reasons why we have small and often diluted complex treatment effects. I don't blame it all on p-value based thinking. I do blame it a bit on p-value based design considerations. So over unrealistic Delta estimates and incorrect baseline rates, all those things that go into a trial. I don't want to sell Bayesian thinking as a panacea, cautious to do that. And I will acknowledge that there's far more similarities than differences. Like when you get down to it, it's still the same data. It's just really coming down to how we're going to move on it. And from that perspective, I would argue that I think the real benefit to Bayesian trials, meaning that we don't change a lot of the other issues that malign the outcomes of critical care trials, is that it makes our trials more efficient, meaning that we learn more from the patients that are enrolled in them and we enroll fewer patients. And that, to me, is at least enough to start to try it or try to do more of it. So I hear you and I don't disagree with you. I just. It would be a good conversation for a beer. Hopefully we will meet at SACM and have that discussion. I think that like we have tried to go over all the questions and thank you so much for thank you to the audience for attending this. We had a really good lecture by Michael and I learned a lot again just at the end of the lecture. This webcast is being recorded. The recording will be available to registered attendees within five to seven business days to access. Just log on to my SACM dot org and navigate to my learning tab to access the recording. I think that concludes our presentation for today. Again, thank you so much for everyone for joining. And don't forget, we have upcoming discovery events coming up July 12th at 2 p.m. Central Time. We will have a roundtable talking about the different approaches to model building in critical research, particularly retrospective prospective studies. And there will be another prop webcast about a very, very cool topic, which is propensity score matching. Should we be doing it? How should we be doing it? So look forward to everyone joining us for those roundtable and webcast. Thank you so much. Thank you. Take care.
Video Summary
The webcast discussed the use of Bayesian analysis in critical care trials. The speaker explained that Bayesian thinking offers a probabilistic interpretation of intervention effects, allowing for a more nuanced understanding of the data. The speaker introduced the concept of prior distribution, which represents a best guess about the effect estimate before the trial and is combined with the likelihood of the new trial data to create a posterior distribution. This posterior distribution is a more up-to-date and comprehensive representation of the evidence. The speaker emphasized the advantages of Bayesian thinking, such as the ability to answer multiple questions simultaneously, including the assessment of equivalence and the evaluation of different effect sizes. The speaker also discussed the limitations and challenges of Bayesian analysis, such as the need for careful selection of priors and the complexity of sample size calculations. Overall, the webcast highlighted the potential benefits of Bayesian analysis in critical care trials, particularly in the context of historically unsuccessful trials and the need for more effective therapies in critical care.
Asset Subtitle
Research, 2023
Asset Caption
Bayesian critical care trials are becoming increasingly popular because of their adaptive and flexible qualities. Michael O. Harhay, MPH, MS, PhD, cochair of the Society of Critical Care Medicine’s Discovery Research Methodology Workgroup, reviews the latest information on Bayesian trials and how best to implement this trial method during this webcast.
Meta Tag
Content Type
Webcast
Knowledge Area
Research
Membership Level
Select
Membership Level
Professional
Tag
Clinical Research Design
Year
2023
Keywords
Bayesian analysis
critical care trials
probabilistic interpretation
prior distribution
posterior distribution
equivalence assessment
effect sizes evaluation
limitations of Bayesian analysis
sample size calculations
Society of Critical Care Medicine
500 Midway Drive
Mount Prospect,
IL 60056 USA
Phone: +1 847 827-6888
Fax: +1 847 439-7226
Email:
support@sccm.org
Contact Us
About SCCM
Newsroom
Advertising & Sponsorship
DONATE
MySCCM
LearnICU
Patients & Families
Surviving Sepsis Campaign
Critical Care Societies Collaborative
GET OUR NEWSLETTER
© Society of Critical Care Medicine. All rights reserved. |
Privacy Statement
|
Terms & Conditions
The Society of Critical Care Medicine, SCCM, and Critical Care Congress are registered trademarks of the Society of Critical Care Medicine.
×
Please select your language
1
English