false
Catalog
SCCM Resource Library
Conducting Post Hoc Analyses Using Clinical Trial ...
Conducting Post Hoc Analyses Using Clinical Trial Data
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Hello, and welcome to today's webcast, Benefits and Restrictions of Conducting Post-Hoc Analyses Using Clinical Trial Data. My name is Laura Ebbott, and I'm a surgical critical care pharmacist at University of Kentucky in Lexington, Kentucky, and I'll be moderating today's webcast. A recording of this webcast will be available within five to seven business days. I'd like to say thank you for joining us, and a few housekeeping items before we get started. There will be a Q&A at the end of the presentation. To submit questions throughout the presentation, type into the question box located on your control panel. Please note that this disclaimer is that content to follow is for educational purposes only. And now I'd like to introduce your speaker for today, Dr. Bram Roschwerg, intensivist and researcher at McMaster University in Hamilton, Canada. And now I'll turn things over to your presenter, Dr. Roschwerg. Thank you very much, Laura, for the kind introduction and so appreciative for the invitation to participate in this discovery webinar series and the invitation to give the talk here today. So very, very excited to be here and speaking with you all. Start in terms of disclosures, I should mention that I do work as a clinical practice guideline methodologist for SCCM, but also for a number of other national and international critical care societies. I do work as an intensivist as well as trialist based in Canada. I have no financial conflicts of interest to declare. In terms of outline and objectives for today, we'll talk about post hoc analysis, what types of post hoc analysis of clinical trial data are common, some of the limitations around post hoc analyses with a specific focus on subgroup analyses. I find that maybe you do as well, that the most common application of clinical trial data in the post hoc setting is looking at specific subgroups of patients and how they respond to the intervention versus others. So there's going to be an especially keen focus on subgroups and assessing credibility of subgroups. Start with a quote from Buddha in the time before Christ. But after observation analysis, when you find that anything agrees with reason and is conducive to the good and benefit of one and all, then accept it and live up to it. Post hoc analysis, I think we use this word a lot, maybe without understanding the underlying meaning in terms of Latin derivation. Post hoc translates into after this. It comes from the statement post hoc ergo propter hoc, which directly translates to after this, therefore because of this. This in contrast to a priori, which translates directly to from the earlier. I think we're all seeing an increasing prevalence of post hoc analysis in reports of research that we read. I think this is a reflection of increasingly complex clinical trial registries, databases, and again, we're seeing more and more of this type of research being published. I think the big risk is putting too much reliance in post hoc analysis in terms of influencing our care. And given that it is often unplanned, there's at least some suggestion that it should be considered hypothesis generating at best. We'll talk a little bit more about this. Here's an example of a post hoc analysis, one that is ripe with confounding, looking at the stock market crash in the 1920s and seeing that stock prices seem to drop soon after a witnessed drop in solar radiation. And one were to look back at this data might surmise that stock prices are closely linked to the prevalence of solar radiation. And obviously here, a risk in this post hoc evaluation likely due to confounding factors. And so we talked about confounding as a risk in post hoc analysis. Multiple testing is a monumental risk and this especially comes up in biobank studies and genome based studies where you have these large registries and it's so easy to go back retrospectively mining for data points that show a signal for statistical significance. And a lot of this comes from the fact that it's not possible, obviously, to power a post hoc analysis of a registry data, you know, prospectively, we can power based on the number of patients or observations that are needed to meet a pre-specified threshold. But in the post hoc setting, you're limited by the data that's there and the data that's available. And obviously, in the post hoc setting as well, this lack of pre-specification has important implications, again, that we will talk about more. It's also safe to say that definitions are evolving and it's often not dichotomous a priori versus post hoc. And traditionally, you know, one would do a randomized control trial and then might delve into the trial registry afterwards to look for certain associations. However, with increasingly complex statistical analysis plan, folks that are publishing protocols well ahead of time, it's often room to describe analysis that you plan down the road. And this further blending, you know, a priori versus post hoc. And I think it's important to keep in mind. Common types of post hoc analyses that we see published, especially from critical care area or others. One might be exploring new associations, using a clinical trial database to look for associations between certain factors. And this at times can even be completely unrelated to the intervention that was studied as part of the RCT. And I'll show an example of that. We already alluded to subgroup analysis. Looking back on a randomized control trial to see whether perhaps elderly patients behave similarly to younger patients, whether higher dose of an intervention respond similarly to a lower dose. There's a litany and huge number of potential subgroup analyses that can be conducted in the post hoc setting. Meta-analysis, individual patient data meta-analysis is a post hoc application of RCT data in answering research questions. Biobanking, a possibility, especially when there's biologic samples that have been collected in the context of a clinical trial. And then one area that we've seen a drastic increase in recently is re-analysis of clinical trial data using novel and new statistical techniques. And we've all seen this revolution of Bayesian statistics being applied to previously done frequentist clinical trials. And we'll talk about that again a little bit later on. So starting with exploring new associations. So, you know, clinical trialists do trials. They develop these large clinical trial databases, which entail all the baseline characteristics and progression of clinical outcomes in the patients that are studied within the RCT. This can provide a very rich opportunity for exploring associations sometimes that are completely unrelated to the intervention itself. And the example I show here from intensive care medicine from the sales study, which those that are familiar with it was a randomized control trial that looked at the role of statins in patients with ARDS. These investigators actually used this study database to explore for sub-phenotypes of ARDS using latent class analysis, completely independent of the intervention that was studied here in terms of the statin, but used the clinical database to explore other associations. And so we commonly see this done. Important to mention that when this is done, it's again, independent of the randomized variable and to prospective observational study, or at least prospectively data collected. But all those issues with confounding that one might see in a non-randomized study would still be present. Before we jump into the subgroup issue, and I told you already that we would perseverate on the subgroup issue for a little while, I'm going to take a couple of jaunts, side trips on a couple of, I think, crucial methodologic teaching points around subgroups. And so I hope that's all right. We'll do those two as side trips, and then we'll jump into how we assess the credibility of subgroups. And this first jaunt into a methodologic consideration and subgroup analysis is considering subgroup analysis based on a pre-randomization variable or a post-randomization variable. And what I mean is a variable that is present at the time of randomization. These are baseline characteristics like age, gender, severity of illness, versus a randomization variable that only comes up after the time of randomization. To hammer this home, I'll use an example from the critical care literature, the BASICS randomized control trial that was recently published in JAMA. Many are probably familiar with this study by the Brazilian researchers at Bricknet. It was actually a factorial design that compared different fluid types for resuscitation in critically ill patients, comparing plasmalite versus saline, but also comparing infusion rates, fast infusion versus slow infusion. We're going to focus on the different types of fluid, plasmalite versus saline. And from the BASICS trial, investigators showed no difference between using plasmalite versus using saline for fluid resuscitation in critically ill patients. And I know, at least on social media, many looked at these trial results and similar with other fluid trials and said, well, you know, maybe it doesn't matter for those that receive low volumes of fluid, but what about those that require high volumes of fluid resuscitation, receive, you know, three, four, five, six liters of intravenous fluid? Might it matter what fluid type we use in those that require higher volumes of fluid? Well, higher volumes of fluid received following randomization is one of these variables that I described as a post-randomization variable. It's not a variable that was present at the time of randomization. And we all remember from our statistics class and epidemiology class that randomization has the benefit of balancing known and unknown prognostic facts, at least at the time that randomization is done. And so using volume of fluid administration as a potential subgroup variable introduces the possibility of confounding. Maybe those that receive saline, maybe there's something about saline that also leads to requiring larger volumes of fluid administration. And we know that large volumes of fluid administration can be harmful for patients. So maybe we'll artificially see when we look at the results that saline is worse than those that receive larger volumes of fluid. But it's not because of the saline, but because of a potentially confounding variable like here, the larger volume of fluid resuscitation. Another classic example of perhaps misinterpreting results based on a post-randomization variable comes from the original Vanderberg studies looking at intensive insulin therapy versus less intensive insulin therapy in the medical ICU, published in the New England Journal in 2006. Many are probably familiar with this study, looked at 1,200 critically ill patients, compared again intensive versus conventional therapy, and showed no difference between the two. However, subgroup analysis that was presented in this manuscript compared those that were admitted to the ICU for greater than three days compared to those that were admitted to the ICU for less than three days, and actually showed a decrease in mortality with intensive insulin in those that were admitted for greater than three days and an increase in mortality in those that were admitted to the ICU for less than three days. Now, do we honestly believe that the duration of ICU admission impacts whether this intervention works or not? This is a post-randomization variable. How long the patient's going to end up staying in the ICU? Maybe it's that those that stayed in the ICU for longer than three days were predominantly medical patients, and maybe intensive insulin works differently in medical patients than surgical patients that have a shorter duration of therapy. Regardless of whatever the perhaps potential alternative explanation for this finding is, one introduces an element of caution when considering post-randomization variables in subgroup analysis. And that brings us to our takeaway home, takeaway point number one. Regardless of whether a subgroup analysis is considered predefined or post hoc, you need to be aware of subgroup variables that were not present at the time of randomization. This is our first side field trip from a methodologic perspective. Again, this one probably relevant for post hoc subgroup analysis, but also relevant for a priori subgroup analysis. Here's an example of a pre-randomization variable considered in a post hoc subgroup analysis. Here from the HI-WEIN study, they looked at patients that were at high risk of extubation failure in the ICU and compared the intervention, which was high flow nasal cannula plus prophylactic bilevel ventilation versus high flow nasal cannula alone on the primary outcome of reintubation at seven days. The initial trial, I think, was published in New England Journal or JAMA, I can't remember, didn't show any difference between the two groups. However, these investigators here in the Blue Journal published a post hoc subgroup analysis looking at the effect of the intervention comparing on patients that had a high BMI, looking at comparing high BMI patients versus low BMI patients. And you can see looking at this post hoc subgroup analysis, they were able to show that non-invasive ventilation combined with high flow was more beneficial in those with high BMI versus those with low BMI. Perhaps makes sense. Maybe patients with higher BMI have lower chest wall compliance and require higher levels of support. However, this was a post hoc subgroup analysis, and the question many would ask themselves is, should we be applying these results to our practice given that this is post hoc? What are the considerations when trying to decide on the credibility of subgroup findings? And this conundrum is not unique, you know, this is so common in the setting of randomized control trials, you know, maybe you'll see a negative trial, but but those are like, well, yeah, yeah, you know, the big trial is negative. But what about that subset of patients that we're sure would benefit, you know, maybe the sicker ones or the older ones or the ones born in the first half of the year? You know, I'm sure those ones would do better with treatment if we had subgroup analysis or similarly for trials that are positive. Yeah, maybe the whole trial was positive, but there's probably this subset of patients that, you know, this subgroup analysis shows that they don't benefit quite as much as the total population. So, again, a lot of these these trying to decide whether subgroup analysis are credible or not. There's applications both for pre-specified subgroup analysis and for post hoc subgroup analysis. But certainly in the context of this talk and the talk that I was invited for, we'll we'll really focus on the implications for post hoc subgroup analysis. Now, I promised two methodologic side trips, field trips, jaunts before getting into assessing credibility of subgroups, we talked about the first one already, which was pre-randomization versus post-randomization subgroup variables. And this is the second one differentiating between true subgroup effects, i.e. effect modification and baseline risk difference. And this is a concept that's commonly misunderstood. So I figured there was utility in in perseverating on it and discussing it a little bit more. What I show you here is a hypothetical example of an example of risk difference and not subgroup effect. Imagine here three different populations of patients, a high risk population where you can see here in the control arm, the risk of this outcome, let's call it mortality, is approximately 30 percent. Population two, which would be an intermediate risk group where the risk of this mortality outcome is 10 percent. And population number three, which we'll call a low risk group where the risk of mortality is in the range of 2 percent. And so here you have an intervention with a relative risk of 0.67, so reduces mortality by about one third. However, this reduction in mortality is consistent from a relative perspective across these risk groups. And so a same risk ratio in population one, the high risk group, same risk ratio in the intermediate risk group, same risk ratio in the low risk group. However, the absolute difference, the risk difference is going to differ based on the baseline risk. When you apply that risk ratio to the baseline risk, you're going to see a larger risk reduction in the high risk group and a smaller risk reduction in the low risk group. Folks often misinterpret that as effect modification or subgroup effects. It's not the case. We expect to see differences in effect from an absolute perspective amongst risk groups, higher risk groups, even with the same relative risk, are going to show a larger absolute impact. Low risk groups, even with the same relative risk, are going to show a smaller absolute impact. If we're talking about true effect modification, subgroup effects, we're looking at a different risk ratio, a different relative effect amongst the population. So perhaps in the high risk group, this intervention would work with a risk ratio of 0.67. However, in the low risk group, maybe now you see a risk ratio of greater than one on the other side of the line of no effect showing harm. Again, this is an example of risk difference, not effect modification. Another example, a real world example of showing a difference in baseline risk, but not in a difference on effect modification or subgroup analysis, comes from a meta-analysis that I led, published a couple of years ago, looking at the role of corticosteroids in patients with sepsis. As part of this analysis, we found a consistent reduction in mortality with corticosteroids and sepsis, small but consistent reduction with a risk ratio of 0.93, suggesting a 7% reduction in relative mortality across risk groups. And we did meta regression to confirm that this was consistent across risk groups. But you can see that depending on the baseline risk of death, if you're a patient with sepsis with a 10% risk of death, septic shock with a 30% risk of death, or septic shock with multi-organ failure and a 50% risk of death, the absolute effect of steroids is going to vary with larger absolute effect, despite the consistent relative effect. So. It's important to keep in mind. Here is an example of a true subgroup effect modification. So not just the idea of base differences in baseline risk, but a true difference in relative effect amongst different populations. And this one comes from the TRIX trial. Again, some might be familiar with this trial, published in the New England Journal in 2017, looked at patients undergoing cardiovascular surgery with a high Euro score and compared a liberal versus conservative transfusion strategy in these patients, looked at a primary outcome of death, MI, stroke, and need for renal replacement therapy at 90 days. And investigators looked at comparing the effect of liberal versus conservative based on age as a key subgroup variable. And you can see they provided the subgroup analysis in their results section. Those that were under the age of 75 did better with the liberal transfusion strategy, and those that were greater than the age of 75 did better with the restrictive transfusion strategy. So here, one might expect that baseline risk would be higher in elderly, lower in less elderly. That's not what we're seeing here. It's just a difference in baseline risk. We're actually seeing a whole different effect in terms of those that are older seem to be doing better with one intervention, whereas those that are younger are doing better with the other intervention. Here is a classic example of true effect modification, a true subgroup effect that was seen. And this brings us to takeaway point number two, differences in baseline risk are often mistaken for subgroup effects. However, effect modification suggests that there's actually a difference in the relative effect between the groups of interest. And I think it's also important to note that true effect modification to true subgroup differences between any sort of characteristics of patient populations is rare. It's rare to see what we saw in the Trix trial, true subgroup effects. OK, we've taken the two field trips, the two side johns into the methodologic considerations. One was around. Pre-randomization versus post-randomization subgroup variables, the second got at effect modification versus differences in baseline risk. Now, as promised, we're going to talk about how one assesses credibility in subgroup effects with specific implications for post hoc versus pre-specified analyses. And we'll start this discussion with showing you subgroup analysis gone wrong. The ISIS-2 trial was a trial led actually by folks at my institution at McMaster, published in the Lancet in the 1980s, and looked at aspirin versus placebo for patients having an acute MI. The primary outcome was death at 35 days. The trial published in the Lancet showed, big surprise, a benefit of aspirin for patients that are having an acute myocardial infarction. The investigators that published ISIS-2 went on as a demonstration of the dangers of subgroup analysis, so went on to publish this subgroup analysis in the New England Journal of Medicine. Check it out. I swear to God, published in the New England Journal. They did subgroup analysis comparing zodiac signs amongst patients that were eligible and enrolled in ISIS-2. And they found, I don't know my zodiac signs, maybe those on this webinar do, for those of these 10 zodiac signs had a benefit with aspirin in the setting of MI. However, these two zodiac signs, whatever they might be, did not show benefit of aspirin in the setting of MI. Hard to believe that your underlying zodiac sign, maybe someone can come up with an explanation as to why, but hard to believe that zodiac sign is a factor that would influence whether aspirin works in the setting of MI or not. And I think this brings to the forefront, the fact that not all statistically significant subgroup effects are credible. They're not all believable. And so the next question becomes, well, how do we assess subgroup credibility if not all subgroups are credible? And we sort of had a couple of guidelines or ways that one might assess it. However, more recently, a group again at McMaster, this is where I work and I collaborate with these folks, led by one of my supervisors or mentors, should I say, Gordon Guyatt, came up with a tool to assess subgroup credibility. This tool is called Iceman. It was published in the Canadian Medical Association Journal a couple of years ago, and it recognized that credibility in subgroup analysis tends not to be dichotomous, like many things in life, and probably rather reflects that there's a scale of credibility going from a small scale of credibility, going from high credibility to moderate credibility, to low, to very low. And this tool provided a toolbox in terms of assessing credibility in subgroup effects. It was a very rigorous process in how they developed this Iceman tool, did a systematic survey of methods, articles, contacted world experts, multiple rounds of virtual meetings during the pandemic, and ultimately ended up in the tool that I'm about to show you. There's actually two different versions of this tool for assessing subgroup credibility, one that's applicable to randomized control trials, which is the one I'm going to focus on. There's also one that's more applicable to meta-analyses, because the same question of credibility of subgroups comes up in interpreting meta-analyses as well as randomized control trials. Here's the tool itself. There's six questions, I believe. The first question asks, was the direction of the effect modification correctly hypothesized a priori? This really gets at our talk in terms of a priori versus post hoc analysis. And obviously, if you're looking at a post hoc subgroup analysis, well, it's impossible that this would have been correctly specified in the direction of the effect hypothesized a priori. But you can imagine that, let's go from that ISIS example with the Zodiac signs. If Dr. Youssef, who published the report and his investigators had correctly predicted that those 10 Zodiac signs were going to be the ones to benefit and the other two not, and had rationale and reason as to why it was the case, well, that increases the credibility of the finding. Same as for that BiPAP high flow study, if investigators had pre-specified this analysis looking at BMI and ahead of time said, when we look at this study, we think those with a higher BMI are going to benefit more from NIV given chest wall compliance and need for higher PEEP, et cetera. And they, in fact, saw the same effect that they had hypothesized in advance. Again, that increases the credibility of a subgroup finding that we're going to see. You find, despite it makes sense, rarely do folks pre-specify as eloquently the subgroup analysis they want to do. And even more rarely do folks pre-specify the direction of effect that they expect. Was the effect modification supported by prior evidence is question number two. This ties in quite closely with number one. And so, was there previous observational data, small randomized control trials, et cetera, to support this subgroup finding? And again, the direction of the subgroup finding as well. Three, does the test for interaction, the subgroup test of interaction, suggest the chance is an unlikely explanation for effect modification? This gets at how statistically significant the subgroup effect was. A more highly statistically significant subgroup effect, it's going to be more believable, more credible, a less statistically significant subgroup effect, less believable. Did the authors only test a small number of effect modifiers? You know, this comes up, especially in the post-hoc setting, is that we'll go back and look at an RCT and say, well, let's look at these 30 variables and see if any of these potential subgroup variables might explain the heterogeneity in findings that we see. Well, if you're looking at 30 variables, the odds that you're going to find one of those that happen to be positive are pretty high, or a lot higher than if you're only looking at a couple of variables. So the fewer, in general, the fewer subgroup analyses that we do, whether it be a priori or post-hoc, the more credible the overlying results are if you tend to find evidence of a subgroup effect. If the effect modifier is a continuous variable, were arbitrary cutoff points avoided? Here, age is a great example, and I showed you the TRIX trial that showed at the age of 75, there was a discrepant effect, evidence of true effect modification. But using 75 to differentiate between young and old is relatively arbitrary. Would that same effect have been seen if we used 70 or 65 as a cutoff? Or maybe better yet, treated age as a continuous variable when assessing the impact of age on the effect and whether there was effect modification. So if arbitrary cutoff points were used, that would decrease the credibility in the subgroup findings. And ultimately, you're going to assess those 4, 5, 6 questions and land somewhere on this spectrum of credibility. This, again, applicable to either post-hoc or a priori subgroup analysis, and recognizing that credibility in subgroup findings is not dichotomous and likely somewhere on this spectrum of high to very low credibility. I'll show you a practical example of applying the Iceman tool. It's a little bit different here because this is applying to a meta-analysis instead of applying to a randomized control trial, but I think the principles are still the same and still important. And the example I'm going to use is the Living WHO Guidelines, which I've been fortunate enough to act as one of the methodologists for. This is now going on two years, this clinical practice guideline. I think when we started down this, at the beginning of the pandemic, none of us anticipated that we'd still be working on this. I think we just published the 15th iteration of these guidelines a couple of weeks ago, but it's truly been a monumental effort. And the subgroup analysis, subgroup considerations come up uniquely around one of the recent deliberations we had on looking at the role of remdesivir in COVID patients with severe and critical disease. We actually have a living network meta-analysis that's informing this clinical practice guideline. And what I show you at the top here is the pooled effect of all the randomized control trials that have examined the role of remdesivir in severe and critical patients. And you can see an odds ratio of 0.95 with a 95% confidence interval from 0.84 to 1.07. The plain language summary walk away from this was that remdesivir probably had little or no impact on mortality, and probably we would walk away not recommending for remdesivir in severe and critical disease. However, we had planned to do subgroup analysis separating severe and critical disease, treating them as subgroups. Severe, looking at hospitalized patients requiring supplemental O2, critical patients, those admitted to the ICU requiring high flow nasal cannula, non-invasive ventilation, or invasive mechanical ventilation. And you can see down here in these forest plots, when we separated severe from critical disease, it certainly looked like there was a potential beneficial effect of remdesivir in severe patients and perhaps a harmful effect of remdesivir in critical patients. But was this subgroup finding credible? Was it believable? Because if it was believable, we would make separate recommendations for these two populations. If it wasn't believable, we would make one recommendation and likely a recommendation against. And so we applied the ICEMAN tool, the same tool that I just went through with you. There was a couple of differences in the ICEMAN tool that we used, because we used the one that was designed for meta-analysis, as opposed to the one that was designed for RCTs. We looked at, there was a consistent effect amongst the included studies. This is a unique component to the meta-analytic ICEMAN tool. There was a lot of debate about whether this was an a priori hypothesis or not, whether remdesivir might have a beneficial effect earlier in the phase of COVID when viral replication is higher. Uncertain whether investigators thought that or not. The p-value, I think I showed before, but was sort of borderline. They did only look at a limited number of subgroups. And again, a component that's specific to meta-analysis, they used a random effect model. Ultimately, our guideline panel decided that the credibility was somewhere in the range of low to moderate, but we sort of steer towards moderate over low. And believing this subgroup analysis, you'll see if you read our WHO clinical practice guideline, we actually made separate recommendations for these two groups, a conditional for remdesivir in patients with severe disease, and a conditional recommendation against remdesivir in those with critical disease. And this was contingent on our ICEMAN evaluation of subgroup credibility. I told you that despite these being crucial aspects of randomized control trials and reporting amongst randomized control trials, investigators don't do a great job of pre-specifying subgroups in their protocols. And even when they do pre-specify their subgroup analysis, they don't do a great job of specifying their a priori hypothesis either. And there's even a couple examples where investigators did pre-specify the subgroup analysis, but got the direction of effect wrong. This was from the VAST randomized control trial. Again, many in this group might be familiar with this trial comparing vasopressin plus norepinephrine versus norepinephrine alone in patients with septic shock, looking at the primary outcome of 28-day mortality. Investigators did the right thing. They included the subgroup analysis in their pre-specification. They even hypothesized the direction of effect, rarely done, but they did it, kudos to them. And they said that they think that vasopressin is going to have an even larger beneficial effect in those with severe septic shock. However, when they did the analysis and reported the results, it was actually the opposite. Vasopressin had a greater effect in those with less severe shock. And hence, despite the fact that they pre-specified this subgroup analysis, the fact that they had the direction wrong lowers the credibility in this subgroup finding. This brings us to takeaway point number three. I told you we would spend a lot of time focusing on subgroup analysis and its application. And that is, it's important to be careful with subgroup analysis, whether it's post-hoc or a priori subgroup analysis. It's important to evaluate the credibility before deciding how much to trust the findings in subgroup analysis. And I would say that this is especially relevant for post-hoc subgroup analysis in standalone publications. Meta-analysis, I publish a lot of meta-analyses, especially they're great resident projects, great fellow projects, and supervise a number of these folks on these types of studies. I guess application of clinical trial data and meta-analysis is technically post-hoc. You know, you're using trial data that was published previously, you're pooling it together to gain precision. However, there's not really a lot of evidence to support the claim that However, there's not really a lot of limitations in doing this. And it's a rigorous procedure. Assuming, I mean, the biggest detractor around meta-analysis comes down to clinical heterogeneity between the included studies. And, you know, those that are detractors of meta-analysis will continually bring this up. But there's ways that we can assess whether this clinical heterogeneity impacts our ability to pool when we assess for statistical heterogeneity. But I think many that are familiar with the evidence pyramid know that systematic reviews and meta-analysis of randomized control trials fall at the top of the evidence pyramid. And, you know, when we do clinical practice guidelines, these are the sort of evidence summaries that we rely on for informing clinical practice. I would say that the same applications around pre-specification of subgroups, of outcomes, of data sources also apply to meta-analysis. So always better to have a clear protocol ahead of time rather than do the meta-analysis and then cherry pick the results that you're hoping to see. There's unique applications of meta-analysis that are becoming more popular, network meta-analysis, individual patient data meta-analyses. I like this example. It'll take me a minute or two to explain, but I like this example because it shows the power of meta-analysis to address clinical questions. The clinical question here was the role of thrombolytic therapies in MI. And this is what's called a cumulative meta-analysis. It moves historically from the 1960s at the top of the slide down to the 1990s at the bottom of the slide. Interestingly, we actually didn't have traditional meta-analytic techniques well until the 1980s. So prior to 1980, we couldn't really do, we didn't have the statistical knowledge, as far as I'm aware, to do the same type of meta-analyses that we do today. So when you see that meta-analyses, this is again moving north to south, the cumulative meta-analyses over the number of randomized controlled trials, the number of patients over the years, we only really were able to do these meta-analyses in the 80s, but we were able to historically look back and see what if we had done a meta-analysis in the 1970s looking at thrombolytic therapy in the setting of MI. And you can see that from this table on the right-hand side is that thrombolytics in the setting of MI were only routinely recommended or mentioned in textbooks and guidelines into the 1980s. They weren't routinely used in clinical practice prior to the mid-1980s as we accumulated an increasing number of studies and an increasing number of patients that had examined this therapy. However, if we had been able to do meta-analyses, if we had the statistical know-how to do meta-analyses, there was evidence of a statistically significant benefit on mortality for thrombolytics as early as the 1970s, the early 1970s, after 10 randomized control trials were done examining 2,500 patients. But you can see at that time this was completely experimental therapy despite the fact that there was a statistically significant benefit in mortality. And it took another 15 years, another 45 randomized control trials, another 20,000 patients being enrolled in these trials before this became standard of care. And so, you think about all the lives that could have been saved if we routinely used thrombolytics in MI as early as the 70s, all the research money that could have been saved, and it's quite monumental that the impact that meta-analysis could have if one had been able to do it in the early 70s and dictate care 15 years earlier than it took with doing subsequently increasingly larger randomized control trials assessing this intervention. Moving beyond meta-analyses, we've seen unique post-hoc analyses that have come up in the last couple of years, increasingly popular. Here's an example of the START-AKI trial looking at early versus delayed initiation of renal replacement therapy in critically ill patients with acute kidney injury, published in the New England Journal in 2020 using a frequentist. I think at that time, we probably just called it a normal analytic because this is how most trialists analyze their trial results. However, the investigators, and I was fortunate enough to be one of these investigators, republished, reanalyzed the trial results recently using Bayesian statistics considering prior knowledge base and the application of the trial results to these priors to develop posterior effect estimates and publish this as a post-hoc Bayesian reanalysis of the START-AKI trial. Thankfully, in this situation, the two different analytic techniques agreed with one another, suggesting no benefit of an accelerated regime. However, increasing concerns of what to do in the setting where these post-hoc reanalyses disagree with the initial analyses and how one might apply that to clinical practice. And I mentioned already the fact that we're blurring the lines between a priori analysis and post-hoc analysis. Here's an example of a Bayesian reanalysis of the COVID steroid 2 trial looking at high-dose versus low-dose steroids. And although it was published post-hoc, it was actually a pre-specified post-hoc analysis. So the investigators had a statistical analysis plan. They said, we're going to publish the frequentist results in a major journal. And then subsequently, when we have time, we're going to go back and reanalyze the trial results using a Bayesian approach and publish these Bayesian analysis as a secondary post-hoc pre-specified analysis. And so I think, again, this is just an example where, you know, traditionally, we might be able to say pre-hoc or a priori post-hoc. And, you know, again, even that is now becoming a spectrum. It's not so easy to decipher. And this brings us to my final And this brings us to my final takeaway point. Nothing is as simple as it seems. Trustworthiness of post-hoc analysis is not black and white, but a spectrum of credibility. Hopefully, I've given you some of the tools to assess the credibility in these post-hoc analyses. However, you know, each one is likely unique. Each application is unique and requires careful analysis, a thoughtful approach in trying to decide how one might apply these to clinical practice. And with that, I appreciate your attention. Happy to pass things back to Laura. And I think we should have some time for questions. If you'd like to use the question box in the panel, you can type your questions there and I can ask Dr. Rothberg those questions. Dr. Rothberg, when you use the ICEMN as a tool, are you using it when you're like in the trial design or are you also using it when you're reviewing manuscripts? How are you utilizing that tool? It's a great question. I mean, I think it's probably one of these things that if you're a trialist, you should be aware of when you're designing your trial because I guess, you know, some of those aspects of the ICEMN tool do require pre-specification in your protocol and your statistical analysis plan in terms of, you know, pre-specification, only a small number of variables, hypothesizing the direction of effect. So, I think it's a good thing for trialists to be aware of. But I think the best application is when you are then presented with a statistically significant subgroup effect and wanting to know how you're going to apply that to your patients. Maybe you're just a reader of the trial trying to decide, should I treat this subset of patients differently than another subset? Or maybe you're, what's relevant for me, Laura, as a clinical practice guideline developer is, you know, trying to make recommendations. It comes up all the time in guideline panels, should we offer a separate recommendation for this subgroup versus this subgroup? And then it comes down to, well, how much do we believe this subgroup analysis? Because if we don't believe it, then we would offer just one recommendation for the total population. So, I think it's something for trialists to keep in mind. But I think it's something for us as evidence stakeholders, evidence readers, those that are interpreting the evidence to keep in mind when we're, especially when we're presented with something that the trial investigators are selling as an important subgroup finding. It's not always the case. And so, to be a little bit suspicious when it is, you know, sort of go through the process on your own. Yeah, I think that's a great recommendation. As far as post-hoc analyses and when you would decide as a researcher to do it, you talked a lot about, you know, the possibility of saying what subgroups you're going to look at ahead of time. But when do you go back and decide that you're going to do a post-hoc analysis? Yeah, I think it's a great question. I mean, despite the fact that I do think it's easier and easier these days with complicated and long protocols and the fact that we're trying to be all more thoughtful with this sort of stuff, it is easy to pre-specify a lot of this. There's always going to be things that come up as you go along, you know. And I think the pandemic's a great example. Like, when we were originally designing RCTs, I can't imagine that those that, you know, in early 2020 were trying to think of subgroup analysis based on vaccination status. And forget based on vaccination status when vaccines seem so far away, but even different types of vaccines, bivalent vaccines. And so, you know, now though, when we're interpreting some of these studies, it's a big question. Like, what about a subgroup analysis? Would remdesivir work differently in those that are vaccinated versus those that aren't? And so, you know, it's okay to do post-hoc analysis, subgroup analysis sometimes, or any sort of post-hoc analysis. But I think that we just have to be careful in the application of these things, recognizing the risks. And I think that Zodiac Sign 1 was a great example. Like, that's a statistically significant subgroup effect. And it's a ridiculous example, but there's no reason to think that, like, it couldn't also apply to other less ridiculous examples where we get, you know, unfortunately, we misrepresent the literature and what's being shown. And we believe a subgroup finding that's not actually trustworthy or actually the case. So, maybe it is, I think, on that spectrum of credibility, if something is post-hoc and without pre-specification, I think we need, it's healthy to have that little bit of skepticism and be a little bit more careful, maybe not making strong recommendations, maybe, you know, couching things a little bit. But not to say that we shouldn't be doing post-hoc analysis, because we should. And then the other thing, you know, as a trialist, that we do these trials, we spend years and years of our life, we get millions and millions of dollars to do the trial, and you're left with this very robust database. We should delve into that database and look for associations and look for subgroups and get as much out of that database as we possibly can, because blood, sweat, and tears went to establishing that RCT database. Again, we just have to be careful with applying that and not overstating this is an RCT database. So, everything is automatically, you know, here is that if we're looking for associations, it's still, you know, risks for confounding and need to make sure that we do careful, adjusted analysis and this sort of thing. So, I certainly think that there's still a huge role for post-hoc analysis. It's, as with everything, just make sure that a cautious approach. Thank you. One of the questions from our attendees is, one thing I have noticed over the past 10 years is that the number of re-analyses and post-hoc analyses and meta-analyses seems to have skyrocketed. As someone who loves the RCT and is trying to focus time on doing RCTs, I find myself wondering how to handle the meta-analyses and other analyses with regard to how much to engage with them versus focus on new trials. It's a great question and I probably come from a, I'm someone who publishes, as I said, a number of meta-analyses, but I'm sensitive to the fact that sometimes there's fields and questions with more meta-analyses than there are randomized control trials. And so, I think the answer is probably somewhere in the middle. I think that, you know, there really is no substitute for a well done randomized control trial addressing a research question. And any time you start pooling randomized control trials together, you get benefits and there's sacrifices you have to make as well. And the benefits are, as long as the trials were done similarly, is that you gain precision. I mean, there's nothing better than one randomized control trial addressing a question is combining two or three or four, as long as the trials were relatively similar, examined a similar patient population, you're only going to benefit by the gains in statistical power by putting them together. The concerns get is that when there is important differences between those randomized control trials, and now you're putting together trials that are different and you're balancing, you're weighing off that gains in precision versus the heterogeneity of clinical and methodologic aspects. So, I'll tell you when we develop clinical practice guidelines, it's often those systematic reviews and meta-analyses in context that are the ones that primarily drive recommendations. But there's certainly been cases where, you know, we have one large RCT or two large RCTs that feel like they're different enough from the rest of the evidence base for whatever factor, whatever reason, and they're the ones that primarily drive recommendations. So, unfortunately, I don't think I can offer a wide ranging answer, but as somebody who has a massive appreciation for RCTs and meta-analyses, I would caution against relying too heavily on one or the other and try and have a healthy appreciation for both and try and incorporate both into your interpretation of the literature and application of patients. We have a clarifying question of, is post hoc analysis the same as secondary analysis? Well, I guess a secondary analysis, you know, the problem around terminology is that when you use vague terminology, it can be interpreted in so many different ways. And so, I guess, you know, when I think about secondary analysis, a couple of things come to mind. You have the initial randomized control trial that was published in New England Journal or JAMA, if you're fortunate. Not everything can make it into that initial publication. And so, maybe there's other outcomes, maybe some secondary outcomes, maybe looking at physiologic variables, maybe biological samples that were all the outcomes, IL-6 levels, inflammatory markers that you just couldn't fit into that primary publication that got published in another journal. And so, one might call that a secondary analysis, but it might have been pre-specified, part of the original RCT, etc. And it might not be post hoc. A secondary analysis at the same time could refer to, okay, we published the trial, but now we're going to go back and, you know, look at some outcomes that we didn't initially plan to look at, or we're going to look at some subgroup analysis that we didn't initially plan to look at and publish that as a standalone manuscript. So, I think it's all about being as specific in the language that we use as possible. And so, secondary analysis, in my mind, is relatively ambiguous. And I think it depends on exactly what we're talking about. And I think even as we discussed in today's talk, is that even post hoc analysis is not as straightforward as it used to be, or as we think it is, but generally refers to pre-specification in a published protocol or a statistical analysis plan versus not. I think this next question builds off that complexity part that you've highlighted. Is there a way you suggest for someone who does want to do trials to try and learn some of these more advanced techniques to incorporate them into planning new trials? That's a great question. And probably short of, you know, doing a PhD in Bayesian analysis of randomized control trials, there's probably not massive shortcuts to applying these things. But I think that, you know, finding there's an increasing number of folks that are developing expertise in some of these novel applications. And I think reaching out and finding collaborators that are comfortable and involving them early on. Biostatisticians, right? I think long gone is the era where us, perhaps without specific biostatistical training, can analyze our own trial data. And I know that, at least for my own trials, having an experienced and knowledgeable biostatistician that's comfortable with these new applications is crucial. And then reading through, I guess, you can learn how to do a lot of this stuff with YouTube videos. And there's lots of publications around the nuts and bolts of Bayesian reanalysis and some of these more complicated genomic or biobank analyses. So I think there's ways to gain the expertise, but there's no substitute for knowledgeable collaborators that have been properly trained in how to do them. And I think that the lesson is, is to involve them early on, even when you're designing the trial. Because, you know, there's so much to be said for when you are developing the protocol, including this right off the hop. And there could be collateral benefits to that, Laura, in so much that even funders might be more open-minded to funding your trial if you're very careful with your pre-specification and include some of these novel approaches to analyzing your trial results. I think that's a really excellent point for people to incorporate. How do you tailor your conclusion based on your Eisman assessment? Is it reasonable to draw a strong conclusion if, for example, you had 20 pre-specified analyses and didn't describe the direction of the anticipated results? It's a great question. And so I think if you decide, there's two factors you've listed that might lend subgroup finding to be low credibility. And I can definitely share publications I've done of meta-analyses, and this is my most common application of Eisman, where I say, you know, we found evidence of a statistically significant subgroup effect based on age, but this was found to be of low credibility after application of Eisman and therefore of questionable clinical significance, something like that. I've definitely worded conclusions like that because, again, I think without applying Eisman, you would look at this subgroup analysis based on age or based on Zodiac sign, and, you know, then you would write in the conclusions, well, aspirin works, but aspirin might not work in these two Zodiac signs, or it might not work in older patients. But without an assessment of credibility, you really don't know, same as applying grade and your certainty of evidence around outcomes. So I actually use the credibility assessment directly in the conclusions and might say, you know, this subgroup of findings based on low credibility, based on moderate credibility, based on high. And what I tend to do is in my clinical practice guidelines, in my meta-analysis, in my interpretation of a randomized control trial, I've even done it in editorials before, is that I include my filled in Eisman tool in the supplement or an appendix. So folks can very transparently see the decisions and assessments that I've made, and maybe there's subjectivity to all these things. And so if you're a subjective decision around the number of variables that were assessed or the pre-specification differ, you can see how that would influence your credibility assessment, because it might be different than mine. Wonderful. Well, thank you so much for that today. That concludes our Q&A session. Thank you, Dr. Rothberg. And thank you to the audience for attending. Again, this webcast is being recorded. The recording will be available to registered attendees within five to seven business days. Log into MySECM.org and navigate to the My Learning tab, and that's how you'll access the recording. That concludes our presentation today. Thank you so much, everyone. Thank you.
Video Summary
In this webcast, Dr. Bram Rochberg discusses the benefits and restrictions of conducting post-hoc analyses using clinical trial data. He introduces the Iceman tool, which is used to assess the credibility of subgroup effects, and highlights the importance of pre-specification and prior evidence in subgroup analyses. Dr. Rochberg also discusses the potential risks and limitations of post-hoc analyses, such as confounding and multiple testing. He emphasizes the need for caution in interpreting and applying subgroup findings, especially in post-hoc analyses, and recommends considering the credibility of the findings based on the Iceman tool. Dr. Rochberg mentions the increasing popularity of post-hoc analyses, including re-analyses and meta-analyses, and suggests involving knowledgeable collaborators and biostatisticians early on when designing new trials. He concludes by highlighting the complexity and variability of post-hoc analyses and the need for careful interpretation and application of the findings.
Asset Subtitle
Professional Development and Education, Research, 2022
Asset Caption
Clinical trials are resource-intensive exercises that provide valuable information to guide the approach to clinical care. Although uncommon in critical care, the amount of data and knowledge generated from clinical trials may extend far beyond the initial aims. This webcast provides an overview of post hoc analyses using clinical trial data. Topics include:
Prespecified versus newly generated analyses
Using stored specimens for additional testing
Limitations of using clinical trial data in the post hoc setting
Meta Tag
Content Type
Webcast
Knowledge Area
Professional Development and Education
Knowledge Area
Research
Knowledge Level
Intermediate
Knowledge Level
Advanced
Membership Level
Professional
Membership Level
Select
Tag
Professional Development
Tag
Clinical Research Design
Year
2022
Keywords
post-hoc analyses
clinical trial data
Iceman tool
subgroup effects
pre-specification
confounding
multiple testing
interpretation
application
Society of Critical Care Medicine
500 Midway Drive
Mount Prospect,
IL 60056 USA
Phone: +1 847 827-6888
Fax: +1 847 439-7226
Email:
support@sccm.org
Contact Us
About SCCM
Newsroom
Advertising & Sponsorship
DONATE
MySCCM
LearnICU
Patients & Families
Surviving Sepsis Campaign
Critical Care Societies Collaborative
GET OUR NEWSLETTER
© Society of Critical Care Medicine. All rights reserved. |
Privacy Statement
|
Terms & Conditions
The Society of Critical Care Medicine, SCCM, and Critical Care Congress are registered trademarks of the Society of Critical Care Medicine.
×
Please select your language
1
English