false
Catalog
SCCM Resource Library
Causal Inference From Observational Data
Causal Inference From Observational Data
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Hello, and welcome to today's webcast, Casual Inference from Observational Data. Today's webcast is brought to you by Discovery, the Critical Care Research Network at Society of Critical Care Medicine, in collaboration with the CPP Clinical Pharmacy and Pharmacology Section. My name is Mohamed Effendi. I'm a Clinical Assistant Professor at the Ernest Mario School of Pharmacy, Rutgers in Piscataway, New Jersey, and Neurocritical Care Clinical Pharmacist at Capital Health Regional Medical Center in Trenton, New Jersey. I have no disclosures. We thank you for joining us. A few housekeeping items before we get started. This webcast is being recorded. The recording will be available to view on my SCCM website within five business days. There is no CME associated with this educational program. However, there will be an evaluation sent at the conclusion of this program. The link to that evaluation is also listed in the chat box for your convenience. You only need to complete it once at the end of this webcast. Your opinions and feedback are important to us as we plan and develop future educational offerings. Please take five to 10 minutes to complete the evaluation. There will be a Q&A at the end of this presentation today. To submit questions throughout the presentation, type into the question box located on your control panel. And now I'd like to introduce your speaker for today. Dr. Michael Harhaey is an Assistant Professor of Epidemiology and Medicine at the University of Pennsylvania. He is an accomplished clinical epidemiologist and methodologist who serves on several international working groups and data and safety monitoring boards in the critical care community as the Associate Director of Outcomes Research for the International Society for Heart and Lung Transplantation Transplant Registry, a statistical editor of the Annals of the American Thoracic Society, and as an associate editor of the International Journal of Epidemiology, among other activities. In 2020, he was named to the editorial board of the American Journal of Respiratory and Critical Care Medicine. Michael is the author or co-author of more than 130 peer-reviewed publications, and his research, which has been funded by an F31 and now a K99, R00 from the National Heart, Lung, and Blood Institute, has received several awards. And now I'll turn things over to Dr. Harhaey. Hi. Hi, and good afternoon, everyone. So I just wanted to start and say thank you to Discovery for hosting this webinar. And whether you're watching this sometime in the future or you're here today, thanks so much for signing in. So when I was first asked to talk about this, I felt a little daunted because there's so much one can cover. And what I would like to do over the next 45 or 50 minutes is try to expose you to a wide range of perspectives, ideas, and insights from both the conceptual and philosophical side of causal inference, but also the statistical. And if there's any specific topics that you wish you knew more, heard more about today where you liked additional reading, please feel free to email me. I'm happy to share them. I kind of took a buffet style and wanted to introduce you to a little bit of everything, but I'm always happy to provide more information. And before I jump into the outline for today and specific content, I thought a nice way to set the stage would be if the question of what is causal inference. So a lot of times people contact me where I see a paper and they say, we want to study causation. So we did a propensity score or an instrumental variable. And that's a point where I always say we need to step back because I think there's this belief that causal inference is kind of a general idea or a method. But to me, and I think the people who study it, it's really a way to do science and a very broad approach to answer questions. And it's been around for a long time. It's a no, it's a very hot topic and everybody talks about it today, but it's been popular for decades, maybe even centuries, which roots back in philosophy, more recently in epidemiology and statistics, which is kind of where I'll be coming from. But a lot of the contributors also range from computer science, economics, increasingly zoology and ecology have started to study how evolution's occurring using causal inference methods. In the topics one would find if they were doing causal inference reading could be very widely ranging from theoretical to empirical. So just so you understand some of where I'm trying to pull from today. We'll see if I can advance. OK, so my view is that causal inference really starts as a philosophical, conceptual exercise, and it's much more of that than a statistical pursuit. And the reason I believe that is that when you really know what you're trying to ask, it makes it very easy for you to start to think about how to refine and make the details of your study more precise so that you can get a more informative and optimal answer. And when you talk to other statisticians or editors or reviewers or your friends or colleagues, it makes us have a better idea of what you're trying to do so we can help you choose the appropriate statistical method. And as I find on the editorial side, the more we understand the conceptual clarity behind your idea, it's really easy to say this is a good idea, a lot of thoughts here, the statistical issues may be suboptimal, but we can fix them. It's very hard to fix a conceptually flawed study. So with that in mind, the outline that I would like to propose today really has two large segments. The first 50 minutes, we'll kind of talk about the conceptual and philosophical foundation behind causal inference. And I'll talk about how to formulate a causal question to take that question and to think about how to answer it in causal theory and then how that can inform your observational study design. I'll focus on two specific topics, study inclusion and exclusion, and how to think about what confounding you need to be aware of. And confounding is a topic that we'll hear a lot today, and it's probably the biggest bias that one should be concerned of in causal inference. And that's differences that exist among individuals in your study. And we want to remove them so that we can have this really good causal estimate that we can interpret as the true effect that would exist if there was no differences between people. And then that will lead us directly into the second part of the topic, and that will be the statistical methods that we can use to deal with confounding. I'll talk about traditional covariant adjustment, propensity score, and then very briefly, quasi-experimental methods. And the goal here is to kind of introduce you to what the goals of these methods are and some pros and cons and things that you should be aware of if you're trying to implement them, but also if you're consuming studies in the literature, what are things you should be aware of and where do biases creep in? I will then talk about sensitivity analysis and introduce you to a concept called e-value and give you some websites. And then I have a few slides at the end. I just want to mention a couple of biases that I always want to be on your radar if you kind of take some a few things away from this talk. And then I'll conclude about just the overall talk, review of best practices and some messages. So if I start to bore you and you're listening to recording, you can jump ahead and see those there. OK, so causal inference needs to start with a causal question, and I say this because not all of the questions that we want to do need causal inference. So if you're trying to describe who's in your ICU and what they look like or the description of COVID patients, you don't necessarily need to think about confounding. That's just a descriptive study where if you're trying to create a score that can tell you what is the probability of this individual being readmitted in 30 days? That's a prediction question. And yes, it can be informed by causal theory. But the first thing to make sure you're doing your causal inference is to make sure you have a causal question. And I say this because very often people muddle the methods for these different goals of research and that can complicate the quality of your research and also other people understanding it. So I say you should have a causal question, but you've probably been told your whole life that you can never talk about the results of your study in a causal perspective. And what I'm saying here is not to say that your study should be interpreted causally, but do you have a very clear, motivating question that has a causal background? So to think about a causal question, you need at least two components, and they need to happen in this very specific temporal order. And I always think about this framework as a this causes that. So that this is this exposure. Did you take a drug? Were you admitted to an ICU? Were you given antibiotics? And we want to know that something after that exposure caused something and be very specific about this. Are you interested in mortality? Because everybody eventually dies. Were you interested in 30 day mortality or 90 day mortality or ICU mortality? So you really want to have a very clean exposure and a very clear outcome definition. And it doesn't have to be, but I think the causal inference is much more effective when you have something that's modifiable. So sure, age is associated with a lot of different things, but it's not necessarily something that we can think about the absence of age. So I don't think it's a great setup. So I like to tell people that I think exposures that are modifiable are more aligned with this way of thinking. So I recommend and if you're nervous, you don't have to, but at least start conceptually like this. But I like to see manuscripts and say we're interested in a question of causation. Ideally, we can study this under these perfect randomized trial. But we can't do that. And we're cautious in our interpretation, but we have a motivating causal question. And here are some examples that when I sit with students or colleagues, I try to really iron out. So does receiving a bilateral lung transplant result in improved survival over a lung transplant? Does exposure to air pollution cause asthma? Does being in an ICU that has a high census lead to higher survival or lower reduced ICU survival than those that have lower capacity? Does taking Stimulant X cause pulmonary hypertension? Does vaping cause asthma or COPD? So these questions all have something in common. And the specific thing is that there is at least two states for the individuals who are under study. They can receive a single or a bilateral lung transplant. They can be exposed to air pollution or no air pollution, where a lot of different stages of air pollution. You can take a stimulant or you cannot. And in theory, if we didn't have to deal with ethics and logistics and we just were interested in pure medicine, all these questions could be answered in a randomized trial. But of course, that's not reality. We can't randomize patients to take a stimulant that we know is dangerous. We can't make clinical decisions. We know it will impact lung transplant survival just because we want to test the theory that maybe it's different exposures will lead to different outcomes. You have to be cautious in here. So a lot of questions that are really important to inform our clinical practice and our policy need a kind of causal inference framework that we can't answer in a randomized trial. So I'm going to do this a couple of times, mainly if I forget to say something, I will say it, but also just to make sure we're checking in about where we're at and you'll see how this all starts to build and leads directly into the statistical analysis. So we have a causal question. We know our causal question cannot be easily, ethically, logistically answered for whatever reason in a randomized setting. So how do we get information about this causal question of ours? So causal inference is based on this theory of the ideal randomized experiment. And the ideal randomized experiment is this concept of the perfect counterfactual. And a non-academic example of this is the movie Sliding Doors. So if you haven't seen it, I don't generally recommend it, but I think it's a nice way to think about what we ideally could be doing to answer our research question. So in this movie, Gwyneth Paltrow gets a promotion and she runs down the stairwell into the London Tube and she misses her train. And then it restarts and she runs down the subway stairs and then she gets on her train. And the movie continues on and it looks at the two different sequences of events and outcomes in her life that were basically dictated by getting on that train or missing that train at that single moment. And this is really what causal inference is really trying to do in an ideal world. Ideally, we could examine the effect of different exposures on the same person and everything would be identical and we can examine them in two different states. So this is what's called the causal effect estimate, or the perfect counterfactual. And the concept here is that everything is identical at time zero. So there's a point where she gets on that train, doesn't get on that train, where the only difference that exists is the fact that she got on the train or off the train. And the concept here is that the initial conditions between an individual are perfect, nearly identical. There's no differences. Therefore, there's no confounding. So, of course, that's not how research works. We can't look at people in that reality. So this is where we get into this concept of what I call the non-ideal randomized experiment, but it's really just a traditional randomized experiment. So there's two specific processes that randomized experiments try to leverage. The first one is eligibility criteria. What we're doing when we decide who's in our study, who's not in our study, is we're trying to make inclusion very specific. So do we have more similar than dissimilar people in our study? So we have this concept of initial conditions. They're not exactly the same, but at least they have the same comorbidities, or they have the same age group. And what we're trying to do is get a kind of specific pool of individuals that have similar risk, similar risk for outcome of interest, such as 30 day mortality or 90 day readmission. And then we use randomization. And we always talk about randomization as the gold standard. And the reason that people like randomization so much, and it's such a powerful tool for inference, is what it does is it takes that baseline risk that exists in the people we decided who are eligible, and it builds on this concept that people refer to as exchangeability. And the idea of exchangeability is that the risk of your outcome of interest is more or less balanced or identical between the groups that have been randomized, just as it would be in the overall population. So it's kind of going back to that ideal counterfactual. We have similar initial conditions and we just randomly put people in two groups. And what happens there is that any differences between the two groups is no longer because they were exposed to the drug. It's not because they were given a diabetes medication or not. It's all based on the fact that they were randomized to receive drug A versus drug B. So you often will hear people say that you should not put p-values in table one of your RCT. And the idea there is that that p-value doesn't really tell you anything. And if you think about what a p-value is, it means that five out of 100 times you may have a chance difference. So if you have 25 variables in your table one, you would expect one of them to be different. But why does it concern us and why we're not so concerned about confounding in RCT is that that difference is attributed to chance, not the exposure to someone's, not someone's exposure to some certain intervention. The goal of causal inference is to start to kind of leverage these and other design processes. But I really think these two are the most powerful to move in the design analysis and results of an observational study to be as close to a randomized trial as possible. So check-in, we have a causal question. We know we can't randomize, but we have principles of randomization that we can use to inform how we do causal inference in an observational study. So the first thing I want to introduce you to is a concept that's called target trial emulation. And the idea behind target trial emulation is, in a sense, very straightforward. It is to take the same principles that we use to design an RCT and use them to design our observational study. And there's only so much I could cover today. I mentioned a little bit of a buffet of topics. But I really, really recommend this paper. And I've selected five papers that I'm introducing specifically in this talk that I think are just kind of great examples that if you wanted to educate yourself, it's probably worth taking the time to read or maybe recommending for a journal club. So in this study, they were looking to replicate the prevent trial. And in table one, they have a really nice example of what a target trial emulation is. So you see in row one how the eligibility criteria is very clearly explicated for the target trial, the prevent trial, which was a trial that was published, I believe, in JAMA. And they did it right. They show how they're trying to replicate that as close as possible in an observational study. So what this really is, is we're talking about kind of a conceptual and philosophical exercise of what they did is they said, how can we design this emulated clinical trial using our EHR? And the idea is we'll start with the ideal trial that you would actually run and then try to mimic it. They do the same thing with treatment strategies. So they have the intervention and control and we talk about how it's defined and they talk about how that exposure and control would exist in an observational study. And they continue on, and this is the recommended presentation when you do a target trial emulation. How are people assigned? So something you very clearly want to do an observational study when you're doing randomization is you very clearly have to think about or very precisely have to think about who's in your exposed and unexposed group and how do you determine exposure? Then you talk about your primary outcome. So thinking about we care about one outcome, we may care about several outcomes, but we specify one as a primary, just like you would in the trial. And you specified causal inferences that you're going to try to do and how you're going to do them. So it's really just going through the target, going through the conceptual effort of trying to design the perfect observational study. So we have that. And now we have to think about how can we use causal inference theory to think about confounding? So confounding. You can never say that you have adjusted for all your confounding, that is simply an unknowable fact. So you should always be concerned that whatever observational study you're reading, whether you're reading it of your own or you've conducted it or you've seen in the literature, confounding exists. And I think it's a flaw when certain people talk about using causal inference methods to remove confounding. Now, they can do a lot better and give you very informative estimates, but to say that they've truly removed all confounding is a misconception about what causal inference can do. So there's a couple of different ways to think about confounding, and I've been taught several of them over time. So I'll try to introduce them in three different approaches. You'll see the similarities, but I hope one of them kind of clicks to a way that you can process what it means. So I think the simplest way to think about what confounding is, is simply to difference the presence of differences between individuals in a study. And you're particularly concerned about differences in those in the exposure group and those who are not exposed. And we say an effect estimate is confounded or causal effect estimates. Remember, we're thinking about this ideal estimate in this ideal counterfactual world, but we're really dealing with real world data. So we say that that estimate that we're estimating is not truly the true causal estimate because we have an imperfect comparison for each person. So going back to this diagram that I showed a couple of slides ago, so we talked about having this ideal counterfactual, we can't do that, but we can think about a randomized experiment, which gets us really close or as close as we possibly could in the real world. In an observational setting, we don't have that. What we have is an imperfect substitute, and that means that their initial conditions, those risks for the outcome, such as comorbidities and age, may vary in ways that are really important that can affect how we compare the two groups. So to say what this will be formally presented in a kind of statistics textbook or an epidemiology textbook, we say that the causal effect that we generate from a process of randomization is comparing a sample of subjects under different actions and based on that concept of exchangeability, meaning we equally distributed risk between the two groups and then we just follow people perspectively. In contrast, the association effect estimate from an observational study is different subjects under different conditions, and we're concerned that there are certain factors, some that we can measure, some that we cannot, that are associated whether or not they were given an exposure or a drug or whatever intervention you're interested in, and that can also affect their outcome that you're interested in. So one of the big processes, obviously, if you're familiar with it, is identifying confounders. So this, in a way, is a very complex and sad exercise because you can never truly identify an account for every confounder. And just to kind of do a quick thought experiment, let's think of the simple question, does exercise cause weight loss? So if you estimate an unadjusted effect and you just had an indicator for whether or not someone said they exercised last week or not, and then say you do have weight loss six months later or three months later, it would not be that great of an estimate because there's a lot of different things that would lead to why that would be imprecise or vary among each individual. There may be differences in age, their diet, comorbidities, occupation could dictate how much physical exertion they do during the day and was not technically considered exercise. But if you start to keep on progressing down that path, it's hard to think of everything that can be a confounder. And then even if you think of everything, how do you, you can never be sure that you can measure and account for all that in your statistical model. So one of the popular tools in causal inference is called a directed acyclic graph, and the short term is a DAG. And this is a really powerful causal inference tool that's meant to take you through the causal process and to make you really think through what are these confounders. And I like it as a tool. I think it's a really helpful way to kind of diagrammatically present both measured and unmeasured confounders. And one of the terms you'll hear when people present this is they'll talk about unmeasured confounding or residual confounding. And this is what I meant by the fact that we can never really account for all the confounders. And that term indicates that there are known factors that we can identify in our DAG. But we also put what's known as an error term or E or a couple of different ways. It's a couple of different terms. And that indicates that there are unknown factors that we know are out there that we just can't include to our statistical model. So we don't necessarily know what they are, but there's probably something we missed. And it's just kind of building it into that thought process and our statistical model. So I didn't go through the process of drawing DAGs because there's a lot of literature on this. But if you Google DAGs, there's a lot of guidance. And if you're going to try to do this in your own research and you can't find formal guidance, please send me an email. But there's a lot of specifics about circles and squares and arrows and how you build that. I didn't want to spend too much time on that because I wanted to prioritize other things. But at its simplest level, what I'm showing you here is a DAG. And the DAG indicates our exposure of interest, our outcome of interest, and the fact that there are other things that are out there that couldn't be impacting it. So let's go through our exercise, our mental exercise we just did. So we have exercise and weight loss. And we can think about what are these measurable confounders. We have age, diet, comorbidities. But then we can also start to list that there are other things that we can't get in our data set or we know that we just can't measure. So we can't measure the frequency or intensity of exercise. We don't know about household dynamics. Perhaps having child care is a very important factor. Being married means that someone can maybe help you. A lot of other things about how you're living status. Then there's also genetics. And the short of it is that you can keep on going down this list. And I just wanted to mention it as a helpful way of articulating both what you can account for, but also what you can't account for. And the reason I like DAGs and the journal that I work for where I'm on editorial board, we ask for them frequently when we see a good question that needs to be sharpened, is it really helps us see your logic and it helps us think through what you're trying to ask. And it suggests you went through the conceptual work and it also helps us improve upon it. So it's a nice way of conveying what you're trying to ask and what you've thought through and what you're missing. But my only kind of caveat about DAGs is why I like them is that they are limited by what we know and our beliefs about how things work. So just because you can build one and just because you can identify some confounders doesn't mean you did it perfectly. So introducing that. So we're at this point, we did all of our conceptual exercise, we have our causal question, we thought about it in a clinical trial framework, we thought about our confounders. Now we're at the statistical analysis part. So the goal of statistical modeling is to mathematically accomplish what you were trying to do on a conceptual side of causal inference. And that is at this point, we now want to obtain, ideally, an unbiased effect estimate of our exposure. And the statistical model helps remove, and there's a lot of different terms for this. So sometimes people say we adjusted our model, sometimes they say we controlled for something where we accounted for. All those are referring to the same mathematical process of removing the bias due to confounding. Remember, we have imperfect comparisons. So the idea is that we can include variables that are associated with these imperfect comparisons, and then we get more precise effect estimates. Confounders are also called a lot of different things in the literature. Sometimes you hear them called covariates, sometimes they're independent variables or risk factors. When you hear those variations in terms, just know that everybody's referring to the same general concepts. OK, so there are three broad approaches that I think are kind of manifest throughout the literature, and I numbered them kind of in a frequency that I think that you will see them. And the first one is multivariable regression. And the one thing I just wanted to throw out there is that a lot of times people use multivariate and multivariable to imply the same thing. So multivariate is the incorrect word, and it's probably the most frequent thing that I comment on when I review a paper. So multivariate implies you have multiple outcome measures on the same person over time. You can think about quality of life or pulmonary function tests over time for the same individual over several months, where multivariable means that you are including covariance in your model to adjust for them. The next most popular is propensity score methods, and there's a lot of different things that you can do with them. We'll talk about them a little bit. Then I want to introduce two quasi-experimental methods. One of them is instrumental variable and one of them is difference in differences. And I'll just talk very briefly about them because I think they're less common and they probably have more nuances. But I think they're important tools that exist in causal inference and are very frequently used and perhaps overstated in their conclusions, in my opinion. OK, so multivariable regression. So I tried as hard as I possibly could not to put an equation in here, so I apologize for those who feel kind of turned off by it, but I think it's helpful to think about what we're doing. So what we're doing with our equation, where our DAG in our statistical modeling is we're transforming that DAG that we have into an equation for a line. So our outcome, which used to be on the right side, now is on the left side. And then we have our exposure now becomes on the right side. And multivariable regression is this term, and you may have heard of it in a lot of different terms. So analysis of covariance, ANOVA, ANACOVA, linear regression, generalized linear regression, logistic regression, Poisson regression. These are all kind of variations of the same thing, and they're all just different ways that we can model an outcome. So generalized linear model is used very broadly. It refers to the first three, linear, logistic, and Poisson, as well as many other regression models. And the only real difference between those terms is whether you have a continuous outcome or a binary outcome or an outcome that's skewed, such as how many cigarettes you smoke a day, where we imagine a couple of people are 0, 1, 2, or 3, but then there's some people that have a lot, or how many days you're in the hospital where you have some people that are shorter length of stay, but there are a few people kind of skew out to the right. Then the other one is a Cox proportional hazard model, and that's used for time to event outcomes. So multivariable regression refers to all these as long as you put confounders in your model. And that kind of term covariance adjustment just means that there is variation among those variables that we're going to put in our model and we're trying to remove it. We're adjusting it away. So we take that line, the equation for the line, and the idea mathematically is we're just going to put a whole bunch of more variables into our equation. And as we keep on adding those confounders, in theory, that effect estimate, where that beta that we're looking at, and it comes in a lot of terms, odds ratio, risk difference, mean difference, relative risk, hazard ratio. They all look the same in your statistical code and in the presentation, and they're all achieved using just different variations of this multivariable regression. And the idea is that as we keep on including these confounders, we'll get more and more of a correct or as close to the true causal estimate as possible, this kind of unconfounded effect estimate that we're truly interested in. So I really like multivariable regression, and I really recommend that if you're going to do causal inference, we're reviewing papers that you always keep an eye out for. I think it's the right place to start. It certainly has some limitations. So confounding, people tend to think about as something that becomes into your model because of not including things. But another way that you can induce residual confounding or unmeasured confounding is based on how regression model works. So that confounder one there, let's assume that's age. So if we just put age in there, it assumes that each one year increase in age has a similar effect on your exposure and your outcome. So you can imagine that certain things such as perhaps BMI or some type of vascular response is more or less flat until 20 or 30 or 40, but then it has an exponential curve. So it's a nonlinear line. Imagine a lot of pharmacodynamics are nonlinear. But when you put that linear term in there, we assume that there's linearity, you basically have unmeasured confounding. That means that some of your estimates at each age may be over or under the true relationship. And that's a type of measurement error that can also lead to confounding. So that's not unique to multivariable regression. I just was trying to sprinkle in a lot of different things to keep your eye out for as you read. The biggest kind of concern of multivariable regression is what's known as degrees of freedom. And it's this idea that you can't have more confounders in your model than you do individuals. And that's probably the simplest way I can talk about that concept. But it's a good rule of thumb to be cognizant of. So there's loose terms such as if you have a logistic regression for every five events. So let's assume that you have 100 people and 40 of them die. That means you have 40 events. And for every five events, there would be eight different groups of five, or 40 divided by five. So you shouldn't have more than eight variables in your model. So one of the kind of limitations of multivariable regression is that you can only put so many variables in your model. And as you try to push that kind of theoretical boundary and get closer and closer to the number of subjects, you have concerns that start to grow. So first off, it's harder to think about what's going on in your model, just too many different things going on. You have other concerns such as overfitting, which means you can get very biased, or overly tight confidence intervals that aren't really informative. You can also get colinearity where certain variables just are so similar to each other when you adjust for all these different correlations that all the variables in the model, all the effect estimates get too complicated. So this is one of the reasons where you may decide that multivariable regression isn't the right way for you to do your observational research. My recommendation is that you should try to think about doing your research in a multivariable framework first, instead of jumping right to propensity scores. And this is a little bit of, I guess, me on a soapbox, but I very frequently see it ignored from a model or ignored from a paper, and people just jump right into propensity score matching, which I'll start to talk about now. But there's really no evidence out there that multivariable regression gives you misleading results if you kind of account for some of the statistical properties that I just mentioned. There's really ample evidence that there's usually not that much difference between a propensity score, other than that it doesn't sound as fancy. And this is particularly true as you get in very large samples. Okay, propensity scores. So I imagine that most of you have heard of propensity score and the idea of propensity score is it takes that same equation that we were just looking at and it breaks it up. So now we're gonna break that into two different equations. The first one is the likelihood of being exposed, or the likelihood of having your exposure. So I have a box around, I'm not sure if you can see my mouse cursor, but I have a box around this equation. And this is basically our propensity score. What we effectively are doing is we're making our exposure, whether or not you have drug A or B, or received an intervention or not, making it our outcome variable. And then we're keeping all the same confounders in our model, and then we estimate that model. And then what we get when we estimate that model is a bunch of coefficients that we can now predict from. And then when we predict, it gives everybody in our sample a value that ranges from zero to one. And that value is interpreted as the probability, based on all the covariates or confounders in our model, of being exposed. So if they're very close to zero, that means that they have a very low chance of having the exposure. And if they're very high, one, that means they're very likely that they did have the exposure. I'm not sure what different statistical packages you use, but sometimes I like to just show this. I think that there's this perceived complexity to what people are doing, and it's really not that complex. I think that magic and art of propensity score is how you use it. So up top is the multivariable regression, where we have logistic, which indicates we're gonna do a logistic regression. The sequence is that you have your outcome first, and then your exposure of interest, and then bar one, two, three, and four, just indicates that I'm gonna put those confounders in my model. Below I'm showing how we would calculate a propensity score. The first thing that we did is we shifted around our regression. So now our first regression is, our outcome is our exposure. And then you just do predict propensity score, and then you can use your propensity score and a variation of your multivariable regression model up as above. So let me give you some visuals about what a propensity score looks like and how you can think about them when you see them. So this is a wonderful paper that if you're gonna do propensity scores or you like to do a journal club, I recommend that you read it, I recommend that you check it out. I've learned a lot from it, and I use it in my class, and I very frequently suggest it to authors when I get back reviews. So if you ever get one of these suggestions, I probably reviewed your paper. So these show four kind of, I think, idiosyncratic distributions that one would get in a propensity score analysis. So I'll start with panel A up at the top. So what you see in panel A is that the median, where the kind of mean of that distribution is just around 0.5, and it's pretty similar for the blue group, which indicates the control arm, and the red group, which is beta blockers. So what this suggests to me is that the covariance between who's exposed and unexposed are pretty similar. You see what they're called overlap or also areas of common support. The distributions are pretty similar, which means that the individuals of both groups are pretty good comparators, at least based on what we're starting to see. On the right, you see that there is a little bit more of a shift, and what you're starting to see is fewer and fewer people have similar propensity scores. And just to orient you across the x-axis is the propensity score, which, as I mentioned, always range from zero to one. At the bottom, you see C and D, and you see that there's a lot of individuals who have very different propensity scores and very little overlap between the two groups. And I think this is really important to always show in your study and always look at when you're calculating propensity score or evaluating a paper, because a lot of what you see here is helpful to think about how you can leverage propensity score. So the benefits of propensity score, as I mentioned before, one of the challenges of multivariable regression is that it can be difficult to put all the variables that you wanna adjust for in a model. So if you are in that situation where you're doing like a single-center study and you have all these different factors you feel like you have to include and you can't, then I think propensity score is a very rational tool that one can use. When you get a propensity score, when you predict that probability, one thing you can simply do is put it right back into your model with all your other confounders if you have a big sample. This is a technique that's sometimes termed as doubly robust and the assumption there is that the propensity score may be right or the confounders may be right, but at least one of them is more right than the other one and you start to get a better estimate. You can go back to the slide that I was just on and you can see where people don't overlap. So where you see the red and the blue on panel C and D and you can ignore them. Ignoring is probably the wrong term, but you can remove them from your study. And this is the concept of propensity score matching. It's this concept that there's a lot of people who do not perfectly align, but we can find people who are pretty similar based on a propensity score and we can match them. And then you do your comparison. And the last one is what's known as inverse probability weighting. And the idea is that we want to down weight the people who have a very high risk of being exposed and up weight the people who are low risk because they may be better counterfactuals. Limitation to propensity score is they are not magic. And very frequently I see people say that we did a propensity score and you can now interpret our results as if they're an RCT. And it's just, that's a false perception about what they can do. It can control for the variables that we observed just like in our multivariable regression, but all of these unobserved confounding, all of this measurement error, all this residual confounding still exists. And they start to work pretty well in larger samples. So smaller samples sometimes may be imperfect just because you don't have enough distributions or variation going on in your group. And the one thing I wanted to bring up is I think of all the methods that are used incorrectly with propensity scores, propensity score matching. So some considerations when you see a propensity score match are how many people they are excluded. So I saw one the other day where they matched and ended up removing 45% of their sample. So one thing I don't like about that is that you're losing a lot of information on individuals. And the other thing is you're just reducing the power and precision in your study by reducing all those people. And this is a little challenging. So I'm a sympathetic researcher. It's because if you try to get the perfect match to find someone who has a propensity score of 0.43 who was exposed and 0.43 who wasn't exposed, you may have to become to a point where you don't have a lot of good overlap. If you try to maximize and get more and more matches, what can happen is you match someone who has a propensity score of 0.43 with someone who has a propensity score of 0.49 or 0.37. If you think about that, what you've done is similar to using that kind of linear term in the regression model. What you're doing is you're matching imperfect matches, which again can potentially introduce new confounding in another way. So just kind of have to be cautious about how you match in the processes of matching. And these are kind of concepts. And when I meant about the article, the things where I see people glide over a lot. So I wanted to kind of bring into your attention today. And these are all discussed in that paper and one other paper I'll note in a second. But just to be clear, what I'm talking about is here are the distributions. Here is a crummy attempt for me to draw lines over where the distributions overlap. When you do matching, everybody who is yellow now would no longer be included in your study because they don't have a good match. So just be cognizant of what you're doing in propensity scores or what readers are doing and what the implications of that when you interpret the results. So take home for propensity scores. Has the same issues as traditional regression. It introduces new issues that if you're not kind of paying attention or you don't have a right collaborator to help guide you, could end up making you look bad. And there's a couple editorials out there that I would never want to be on the receiving end of. And the other challenge of propensity scores is that there's a lot of different views about how it should be implemented and what the best approach is. And there really isn't clear cutthroat guidance. Each one of those approaches of using it has slightly different interpretations and it really matters what you want to do. And the other thing is, I'll mention this to end with my omnipresent biases, is what I see a lot is this multi-center studies where they use propensity scores. And where center you're treated in is known as the concept of clustering, which is another type of confounding. So propensity scores, a lot of evidence showed that you have to somehow account for or do within-center clustering or do a model that removes a confounding from center. And if you don't do that, it's a big source of confounding. So always be wary in your multi-center propensity score studies if they don't mention accounting for center. This is one other paper by Rishi Desai who does a lot of pharmaco-epidemiology work. And this is in BMJ, I think just a few months ago, last fall. But if you Google BMJ and propensity scores, this is another wonderful paper that I find really accessible to clinical colleagues. Okay, so just quick introduction to instrumental variables and difference in differences. So these are increasingly popular methods and they have two real key attractions. The first one is they allow us to adjust for confounders just as we were in our kind of regression framework, whether you're doing multivariable or propensity score. And you can extend multivariable and propensity score regression, I'm sorry, propensity score methods into these methods. The added benefit of these is what they're trying to do is leverage a quasi-random process. So we know that we can't randomize, but perhaps we can find something out in the real world that helps us look at variation and that way gets us a little closer to this idea of randomization. So an instrumental variable is defined as something that causes variation in exposure, but does not have a direct effect on the outcome. As I mentioned, it's trying to leverage some external random process. So I'm just gonna mention a few of them just to kind of give you a flavor. So perhaps you're interested in the effect that home birth versus clinic birth in rural Africa. You could use something as severe weather as a way to look at variation who goes to a clinic to having a home birth. What I see a lot in the pharmaco-epi literature is something as pharmaco, sorry, provider variation. So this idea that different providers may prefer one hypertension drug over another one, or may prefer this protocol versus another protocol and you could perhaps leverage that. In the economics literature, this is used very frequently. And one of the big questions is what is the long-term effects of schooling on income? So one way to look at variation in schooling could be the month of birth. So if you're born late in the year, perhaps you start early or perhaps you're held back. And this can add to pseudo-variation. There was a study that came out in JAMA that was looking at month of birth and whether or not you were given a diagnosis of autism as a way to kind of look for inherent variation in the population. In the Vietnam draft was a very popular long-term study because people were randomly taken out of school were given more years of school because of the draft. So it's trying to leverage something that's not really perfectly random, but something that can introduce randomization or the random process. My caveats about it just as consumers of the literature be conscious of, there's a lot of assumptions that go into instrumental variables and it can be very hard to verify and to really show that they're met. And natural process of estimating instrumental variables actually quite complex. It's uses what's called a two-stage regression model. And there's a couple of different ways you need to check things. And there's not really, even in that kind of sophisticated analytic approach, there's not a bulletproof way to test that you have a perfect instrumental variable. So to the defense of your instrumental variable being informative and truly doing what you want it to do is really conceptual. And you'll see people who will leverage them trying to make a conceptual defense. So the last model I wanna introduce is the difference in differences. So this is a very popular approach that I've been seeing in the literature more and more. And I see it a lot in the QI literature where people look at the effect of changing something within a healthcare system. So the idea here is that you have this group which is indicated by the green line and you can think of that as your control group. And then you have this other group where you have, you're gonna get intervention. You're gonna have a policy change or something fundamentally is gonna change. And this idea is that there's times going on and there's a difference between the two groups which is your first difference. So this is a difference in differences, two differences. So we have a first difference and we can use that first difference to think about what we would expect to continue to see. So you can expect mortality or readmission to keep on going up or down, but you've expected to go down at the same difference over time after the policy change. But then a policy happens such as Affordable Care Act is implemented and then you see its new effect. And that difference in that hypothetical or counterfactual effect is the one that can be attributed causally in theory to the policy or the change that just occurred. My view on difference in differences is that I've used them, I like them. I think people have been very clever with them, but I think it's also important to always think about other areas of confound they can sneak in in every study design, but difference in differences have some unique ones. So the first question that you have to ask yourself if you're gonna do it, where you're gonna use it, if the exposure and control groups are really perfect counterfactuals. And the reality is they probably are imperfect. And the ACA, I think it's a good example. So very frequently you see ACA being assessed its impact on readmissions, impact on all these different outcomes. But the question is, if you think about who implemented and uptook ACA is mostly blue States and who delayed were mostly red States and are the populations and the politics and the hospital systems truly perfect counterfactual. And if they're not, can you really adjust away those differences? And you probably could adjust for a lot of them, but just things to kind of think about how you can extend them to your own research. And my view is that most policies that are implemented really aren't that pseudo random. They're pretty much targeted at a group that's expected to be responsive or have good uptake. So a few little caveats. Okay, last kind of statistical approach. So sensitivity analysis. I wanna introduce you to. So we talked a lot about how to handle confounding. And I mentioned many times about this fear of residual or unmeasured confounding. And that's the idea behind the E-value. So the E-value is a way of calculating how strong of an effect would this unmeasured confounder need to be to change the interpretation, the empirical results of your study such that you have a different interpretation. And this is a paper that's in animals internal medicine. So if you wanna check it out, so it's a nice paper and has some good examples. There's also a website now from the paper. So if you just wanna hear the method, I'll show you on the next slide. But here's how you would use it in your own study. And as a reviewer, I almost always ask for this. And just to be clear, there's a lot of discussion in the literature and there's a lot of views about it and it's imperfect, but I think it's a really helpful thought exercise. So we take this study and we adjust for all these different confounders. And we find that infants fed a formula were 3.9 times more likely to die of respiratory infections than those who were exclusively breastfed. And then you can use this concept and say, okay, we adjusted for all these variables. How big of a variable, how big of an effect would we need? You can calculate the E-value. And the E-value would say that the observed risk ratio of 3.9 that we have would have to have some confounder above and beyond what's already in our model of 7.2, which is a really large effect size that would lead to our interpretation of that 3.9. It would actually make it equivalent to one, which is really close to null or no effect. So it helps you think in this hypothetical world, if something existed, how strong of an effect would it be? And if you wanted to try to calculate this, you don't need raw data. You can simply go to a website and you can introduce your odds ratio or your risk for proportions in two groups or hazard ratio, relative risk. You can use it for any outcome measure or effect estimate measure. It's a really valuable website. And I would recommend that as you're thinking about your own work, thinking about a way to insert it. I think it's just a really powerful technique. Okay. Real quickly, I want to mention a couple of omnipresent biases that I just want to leave in the back of your head. But I mentioned these a little bit throughout. So a lot of times, especially in the pharmacoepidemiologic literature, you hear about this concept of confounding by indication. And it's not necessarily a perfect example of residual or unmeasured confounding. It's a little bit more of a selection bias. And what it means is that the people who are taking the medication or who decide they're going to take a medication versus those who don't have this inherent difference that you can never truly account for. And it's simply because they must be sick enough that a physician decided or a clinician decided they had to take a medicine where they must have behaviors that make them willing to try something without a rationale or something else is there that you just can't really adjust for if a confounder. And it's a little bit more of this concept of selection bias, meaning there is these fundamental differences that are not really about confounding, just about who people are. I mentioned measurement error. And measurement error is something that I think a lot of people gloss over. And I think measurement error is everywhere. It's in our lab values. I mean, if you measure blood pressure over and over, you're gonna get slightly different measurements. So which one do you choose? So that means that you can adjust for blood pressure, but note that that's not the exact blood pressure of that person, that it's variable. We know a lot of labs are variable. We can insert our own measurement error. So when you adjust for BMI and you decide to not make it, or if you adjust for BMI and assume that it's linear, or you adjust for age and assume it's linear, you're assuming that there's no variation in the effect over time. That's a type of measurement error. If you start to categorize BMI or age in the five-year brackets, where BMI, anybody above 30 has morbid obesity, what you're assuming is that the risk is identical for everybody above 30, which is probably not realistic at all. And the last thing I wanted to mention was center effects and variation. So different centers have different protocols, different staffing patterns, different education. Every clinician, every pharmacist, everybody does things slightly different, and that can impact outcomes and likelihood of exposures in all different types of ways. So especially as we move into this world where we have these huge multicenter data sets, always look for your, in your articles, for people to account for this variation in your own research, always be cognizant of it, because a lot of stuff can sneak in and be explained away by variation across centers. And I'll just kind of take my take-home messages. So these are my five recommendations to you as researchers and consumers of the literature. Number one, in your own work, play defense. Start off and create your observational studies if you were designing a trial. And even if you think it's kind of silly where you don't like it, it conveys this kind of clarity in what you're trying to do in this lawful process, and that will help you get your papers published, because in a way, that's one of our biggest goals. Make your study inclusion very precise. Try to develop a very clear and unambiguous analysis plan and study protocol. And just because you develop a clear study protocol doesn't mean you can't deviate it from it in the future. It just helps you convey conceptual clarity, because the reality is you're gonna get two or three reviewers, and they're gonna ask you to do something different, and this lets you just have a basis and say, well, I wanted to do this, I'll do it, but here's why. Draw a DAG, same logic, but it also really helps you think about what confounders are out there and how to identify the ones you can adjust for, and also to think of other sources of bias. My recommendation is report all your effect estimates. I report unadjusted, multivariate. If you don't have a clear rationale for propensity score matching or IPW, then just report them all. And I like to see this when people use propensity scores, especially I come back as an editor of a reviewer, and I say, okay, fine, you don't wanna justify why you chose inverse probability weighting for propensity score matching. Put them all on the table, because if you put them all on the table and you just wanna do everything and they all go in the same direction or give you the same story, that's really compelling and really conclusive. If you can think of other methods, if you can find a kind of pseudorandom process such as instrumental variable or difference of differences, try to apply it. I mean, the more that you can create an evidence base that shows that you're seeing the same effect no matter what assumptions or techniques you use, the more compelling it is. And then the last two is I would say always try to introduce an E-value into your study. State what you cannot account for clearly, because you always have someone to measure confounding and just say how big of an effect would that need to be. And I think that you'll find in many studies that you have a pretty decent effect size. One thing that they recommend is that you look at your overall regression, you look at all the different odds ratios, and you look to see if the E-value is smaller or bigger than your biggest odds ratio. And if it's smaller, then that suggests that it could be something that's out there, but if it's bigger, it's very compelling. And the last thing I just wanted to mention is even though you've done everything as best you can, always consider other sources of bias and measurement error and note them in a discussion. I'm, as a kind of consumer out there, I can always respect people that acknowledge what they couldn't do perfectly rather than people that gloss over it. So with that, I hope this gave you some new perspectives and thank you and good luck in your own research and I'll stick around to answer questions. Thank you. Michael, thank you for the presentation. I thought that this was such a great review of such important key elements for consideration when assessing, analyzing, and developing observational research. Definitely very useful content. Let's dive into some of the questions that we have here. First one asks, what is your recommendation of the preferred methods for confounders control for an observational study in which the patients act as their own controls? And an example they give is a study that compares patients' tolerance for pain post-ICU admission versus before ICU admission. Interesting. So it's essentially what people usually call as a case crossover design. That's, I'm just, I'm collecting my thoughts. So I think the real challenge in a case crossover design is that you have the benefit of having the same person but you have them in two different states. So if you think about that kind of ideal counterfactual, so we did back in the beginning, ideally you would have them getting on and off the train, which you can't, but the closest you can get to that in the real world is a case crossover design, which is not something I introduced. In that setting, what I think you need to do is you need to have time-varying adjustment for individuals. So whatever confounders are in your specific study, you would need to have information about them before and after. But I know that's probably a little tricky because a lot of pre-ICU factors are really difficult. But to the extent which you want to adjust for anything after that could influence your outcome, you would want to try to get as close to that as possible. But that specific framework is probably as close to true cause one, for instance, one can get in case crossover trials are really sought after for that reason, specifically in the pharmacokinetic world. So that's an imperfect answer. And if whoever asked me that wanted to send me an email, I'm happy to try to do a little bit of a literature search, but that was a little out of left field. I wasn't expecting that one, but I hope that that's a little bit helpful. Great, thank you. We have another one about E-values. So the listener asks, how do you put the E-value in context? How do I decide what is big enough or not big enough? Yeah, so if you are familiar with kind of regression and doing it on a computer, usually you'll get a panel and you'll have the odds ratio, for an example, or the risk ratio for your exposure, but you also have one for age. You'll have one for comorbidity, one, two, three, and four, and so on. One of the kind of loose recommendations is to look at all those and say, what's the biggest odds ratio? So let's say your biggest odds ratio is 2.1. So that means that the biggest effect of anything you measured increases the odds of your outcome by 2.1. So if you have an E-value of 2.4, that means that you would have to have something that wasn't already in your model that has a stronger effect than anything in your model, which would suggest that it's hard to think of what that could be. And if that's there, then it's kind of compelling to think that it's hard for me to think about what this confounder could be, because you've imagined that in a good regression framework, good causal inference framework, you have adjusted for a lot of things. So this is one approach. Now, that doesn't mean that if you have a biggest odds ratio is 2.1, and you have an E-value of 1.8, that you are at big risk. It just means that there are certain variables that we know have a big enough risk that in theory, there could be another one out there. And you just need to caution that there is a potential risk for residual confounding. Articulate what some of those may be. So I think this becomes a little bit of an art at that point. So you could say like, we think that perhaps using this drug could lead to a higher risk, but we don't think that the risk is gonna change the risk for mortality by 100%. Therefore, even though we can think of this hypothetical confounder, it may not be that strong. And that's how I usually see people. I mean, from a practical standpoint, that's usually how I see people do it. So if it's above or below your highest risk estimate, it's usually where people side. But I think that the big thing to remember is E-value is just a helpful thought experiment about how big something would be. It doesn't necessarily mean that something is out there. Just thinking about what the influence of it could be on your actual results. Great, that was very helpful. Another question here. What role do you see for mediation analysis in the critical care literature? I don't see it used often, but it seems that it could be very valuable. Yeah, so mediation is really interesting. I think it's probably, so one of the other ways of thinking about mediation is this idea of surrogate outcomes. So a mediator is something that, if you go back to that DAG I drew, you have exercise, then you could think about weight loss. I'm trying to think what a good mediator weight loss would be. But I can't immediately think of one. But it's this idea that it's something that if you know about it, it means you know more about your future outcome. So I can't immediately think of seeing one. I think my mentor, Scott Halperin, did one in intensive care medicine with Hannah Walsh a couple years ago that I need to look up. But I think the biggest trick of mediators is that the big thing about mediators is that they explain something in your outcome. So you can find something that's a mediator, but it can explain very little of your ultimate effect estimate. So when I think about their true utility, I think it's really about finding a really good surrogate outcome, which is, as I mentioned, just an alternative way of thinking about mediator. But you need to find it such that it explains a huge amount of variation in your outcome. Because what I see a lot is these, and not in critical care, but in other studies, especially in the psychology literature, is they'll say we have a mediator to explain 5% of our outcome. And I guess my take is, so what? Like, it doesn't mean anything to me. So I guess the kind of thinking on my fly is finding an outcome that explains a lot of your variation. And if you think about what your long-term outcomes in the ICU may be, such as perhaps cognitive ability in one year, or functional morbidity at six months, or quality of life, if you could find something down the road that would be really predictive of that six-month or 12-month outcome, that would be really powerful. Because then you could start to do clinical trials that don't need to follow people for one year or nine months, which we know in the critical care world it's really hard because a lot of people end up passing away, unfortunately. So we have a lot of missing data. So I think that if you could find a mediator that would really be predictive at a short time point after discharge of long-term outcomes, that would be really powerful. But I don't know what that is, and if anybody has any ideas, we should write a grant together. I think that would be really cool. Great, that was a really helpful response, thank you. Well, that concludes our Q&A session. Thank you, Dr. Jorge. Thank you to our presenter and the audience for attending. Again, you will receive a follow-up email with a link to complete an evaluation. The link to that evaluation is also listed in the chat box for your convenience if you do not wish to wait for the follow-up email. You only need to complete it once. There is no CME associated with this educational program. However, your opinions and feedback are important to us as we plan and develop future educational offerings. Please take five to 10 minutes to complete the evaluation. The recording for this webcast will be available on the My SCCM website within five business days. That concludes our presentation. Thank you all for joining us. Thank you.
Video Summary
Dr. Michael Harhay gives a webcast on causal inference from observational data. He emphasizes the importance of having a clear causal question and understanding the subject matter. He introduces target trial emulation as a way to design observational studies that replicate randomized controlled trials. Confounding is discussed as differences between individuals in a study that can impact the outcome. Statistical modeling approaches to address confounding include multivariable regression, propensity score methods, and quasi-experimental methods. Dr. Harhay highlights the limitations, assumptions, and potential biases associated with each approach. He stresses the need for careful consideration of confounding variables and the use of tools like directed acyclic graphs to guide analyses. The transcript also mentions the challenges of including too many variables, the similarities and differences between propensity scores and multivariable regression, and the use of instrumental variables and difference-in-differences to control confounding. Sensitivity analysis, such as the E-value, is recommended to account for unmeasured confounding. Overall, the webcast provides an overview of causal inference methods and their application to observational data, emphasizing the importance of careful analysis and consideration of biases.
Asset Subtitle
Research, 2020
Asset Caption
Although observational studies are very common in the critical care literature, they are susceptible to many types of bias, making it difficult for critical care researchers and clinicians to interpret and apply the results to practice. This webinar will highlight the key steps involved in conducting rigorous observational critical care research. Topics to be covered include: an introduction to the potential outcomes framework and how this framework can be used to design an observational study, common sources of bias, strategies to control confounding (e.g., covariate adjustment, matching, propensity scores), approaches to confounder variable selection, including the use of directed acyclic graphs, and potential approaches to sensitivity analysis.
Meta Tag
Content Type
Webcast
Knowledge Area
Research
Knowledge Level
Intermediate
Knowledge Level
Advanced
Membership Level
Select
Membership Level
Professional
Membership Level
Associate
Tag
Outcomes Research
Year
2020
Keywords
causal inference
observational data
causal question
target trial emulation
confounding
multivariable regression
propensity score methods
quasi-experimental methods
directed acyclic graphs
sensitivity analysis
Society of Critical Care Medicine
500 Midway Drive
Mount Prospect,
IL 60056 USA
Phone: +1 847 827-6888
Fax: +1 847 439-7226
Email:
support@sccm.org
Contact Us
About SCCM
Newsroom
Advertising & Sponsorship
DONATE
MySCCM
LearnICU
Patients & Families
Surviving Sepsis Campaign
Critical Care Societies Collaborative
GET OUR NEWSLETTER
© Society of Critical Care Medicine. All rights reserved. |
Privacy Statement
|
Terms & Conditions
The Society of Critical Care Medicine, SCCM, and Critical Care Congress are registered trademarks of the Society of Critical Care Medicine.
×
Please select your language
1
English