false
Catalog
SCCM Resource Library
The Role of AI/ML in Prediction Modeling
The Role of AI/ML in Prediction Modeling
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Hi, and welcome to today's broadcast, The Role of AI in Machine Learning and Prediction Modeling. My name is Megan Zingler. I'm a clinical pharmacy specialist in the Neurocritical Care Unit at Trinity Health in Ann Arbor, Michigan. I will be moderating today's webcast. A recording of this webcast will be available within five to seven business days. Log into mysccm.org and navigate to the My Learning tab to access the recording. A few housekeeping items before we get started. There will be a Q&A session at the end of the presentation. To submit questions throughout the presentation, type into the question box located on your control panel. You will also have the opportunity to participate in several interactive polls. When you see a poll, simply click the bubble next to your choice. Please note the disclaimer stating that the content to follow is for educational purposes only. And now I'd like to introduce to you our speakers for today. Andrea Sikora is a clinical associate professor at University of Georgia College of Pharmacy in Athens, Georgia. Brian Murray is a clinical assistant professor at University of Colorado Skaggs School of Pharmacy in Aurora, Colorado. And now I'll turn things over to our first presenter. All right, hi everybody. Thanks for joining us. And thanks Megan for that introduction. Andrea and I are excited to have a fun and interactive talk today. And hopefully everyone will be able to come away feeling like they learned something. We do have three objectives that we're gonna try to cover today. So by the end of this talk, you should be able to review the strengths and limitations of traditional medicine based modeling, discuss advantages and key concepts of assessing prediction modeling approaches based on machine learning. And finally, we'll explore some use cases of artificial intelligence algorithms for intensive care unit event prediction. Before kicking things off, we wanted to use a couple of polling questions to get a sense of the audience and to help us tailor the content a little bit. So our first question for today is, including measures such as area under the receiver operating characteristic or AUROC and positive predictive value. Okay, so it looks like the majority of our audience is somewhat familiar. All right, and then we do have one more polling question for you, kind of trying to get at the reason why you're interested in this presentation. So which of the following best describes your interest in today's presentation? Are you a bedside clinician wanting to know more about the application of AI and machine learning to clinical practice, or do you describe as a clinical scientist wanting to know more about the latest use cases? Okay, so mostly bedside clinicians, but we've got a mix of other things in here, too. All right, great. So with that, we are going to jump into our first objective, which is reviewing strengths and limitations of traditional regression-based modeling. And we're going to start out with an introduction to why prediction modeling is so important in the ICU. And there are a number of ways in which prediction modeling impacts healthcare, spanning from the clinical care itself all the way to reimbursement. Benchmarking, clinical benchmarking, is one such example where patient data can be used to generate performance expectations, and those expectations can then be used as a measuring stick for hospitals, health systems, and even individual patient care units to drive quality improvement and pay for performance. An example, which we have here on the screen, is the HMS sepsis mortality model, which was a Blue Cross-funded sepsis collaborative quality improvement initiative in Michigan. This model is based on hospitals within the health system sending data, which was used to build this model for sepsis outcomes. Now hospitals submit their data to the sepsis registry, and in return, they receive real-time performance data and quality improvement initiatives, and that data is also used for pay for performance. Another example of the role of prediction modeling is an event prediction. Early warning systems give providers the chance to intervene and potentially prevent adverse outcomes. An example shown here on the screen is the Epic Deterioration Index, or the EDI, which has been implemented at some hospitals and health systems. This is a tool based on a logistic regression model that predicts the risk of either renal replacement therapy, ICU transfer, cardiac arrest, or death using 31 measures from the electronic health record. And then when a patient exceeds a certain threshold, the tool then sends an automated alert to the nurse and the physician caring for that patient. In one study, implementation of the Epic Deterioration Index along with a protocol for provider response reduced the risk of escalation of care by over 10% in patients at or around the alert threshold, but it's important to keep in mind here that a prediction model itself cannot improve outcomes. It has to be paired with appropriate response actions. And then finally, prediction models also potentially have a role in clinical decision making. So using a model built on large amounts of patient data, a prediction model may be able to identify the best intervention from a set of potential interventions for a given patient based on patient characteristics, thereby ensuring that the right patients are receiving the right treatments to optimize outcomes. Now a lot of the first part of this talk is going to focus on terminology and how that terminology determines your interpretation of model generation or model performance. And these terms are going to come up again throughout this presentation, so I'm going to introduce and define some of those terms now. For the purposes of this talk, a model is an equation that maps a set of independent variables to a dependent variable of interest. Explanatory modeling uses statistical models to test hypotheses about associations among variables, so why something is the way it is, while predictive modeling uses statistical models to estimate new or future information, so what's going to happen in the future. It's important to note here that prediction or correlation does not imply causation. So for example, just as you would predict that the sun is about to come up when you hear a rooster crow, that does not mean that the rooster is making the sunrise. So that leads us to causal inference modeling, which is specifically intended to model a cause and effect relationship between two variables and does so by incorporating causal pathways into statistical models. And finally, machine learning here refers to an algorithm's ability to learn new rules without human programming. Now, you obviously can't generate a model without data, and there is also terminology around that data that's being used. Training data is the data that is used to build a model, and because that model is built on that training data set, the model will almost always perform better on that training set than it does on future data sets. The testing data set is the data set that is then used to apply the model and evaluate that model's performance. Validation in this context can have two different definitions. It can either refer to evaluation of model performance on a completely independent data set, either from another institution or another temporal setting, or validation can refer to an intermediate step in between the training and testing phases where the model is fine-tuned to the data. So as you can see in this graphic here on the right, you train the model on the training data set, you evaluate the model on a validation data set, and then you tune or modify the model based on its performance on the validation data set before finally testing it on the testing data set. Data sets are comprised of features or descriptors such as age, sex, etc., and these features may be either labeled or unlabeled. Now focusing on regression modeling, there are two general types that are considered traditional, and those are linear and logistic regression. We'll be discussing these more later. And there are a couple of potential outputs of regression modeling, including classification, wherein the output predicts the class of the data set based on the independent input variables such as does the patient have sepsis, yes or no, and regression, wherein a continuous output variable is predicted based on independent input variables such as predicting the length of hospital stay based on patient features. Another important term to know here is the beta coefficient, which describes the predictive power of any given independent predictor variable in the model. As for a change in one point in the independent variable, there will be a change in the dependent variable equal to beta. So as an example, if we are using SOFA score to model mortality, a beta coefficient of two would mean that for every one point increase in the SOFA score, mortality would be expected to double. There are a number of performance metrics that will often be reported with prediction models, and it's important to know how each affects interpretation not only of model performance but also how it relates to clinical application. So we're going to spend some time here making sure that these are clear. The first metric here is sensitivity or recall, which is calculated as the rate of correct positive predictions of an outcome divided by the number of actual outcomes that occur. Sensitivity is important for evaluating the ability of the model to predict true positive outcomes and essentially answers the question, if the outcome is going to happen, will the model predict it? Next is specificity, which is calculated as the rate of correct negative predictions divided by all non-outcomes. Sensitivity is important for evaluating the ability of the model to predict true non-outcomes and essentially answers the question, if the outcome is not going to happen, will the model predict that it is not going to happen? Precision or positive predictive value is calculated as the rate of correct positive predictions divided by the total number of positive predictions. So positive predictive value indicates the reliability of a positive prediction and answers the question, if the model predicts an outcome, will it occur? So note the difference here between positive predictive value and sensitivity. Negative predictive value is the rate of correct negative predictions divided by all predictions of a non-outcome and indicates the reliability of a prediction of non-outcome. And then accuracy is simply the rate of true positive and true negative predictions divided by the total number of predictions and answers the question, what are the chances that the model's prediction is correct? And finally, the F1 score is a balanced metric that combines precision or positive predictive value and recall or sensitivity. And because these metrics are so important, we are going to practice with them a little bit. So these apply to prediction modeling, but they also apply to other things like diagnostic tests. And so we are going to use a simple example here of a MRSA nasal swab. These numbers are directly from a publication on the performance of the MRSA nasal swab for diagnosis of MRSA pneumonia. So you can see here that the sensitivity or the chances that the test comes back positive if the patient has a MRSA pneumonia is 88% or 22 divided by 22 plus 3. While the specificity, the chances of the test comes back negative if the patient does not have MRSA pneumonia is over 90%. Both of these values are excellent, but they don't necessarily tell us how to apply the test in the clinical setting. So look what happens when we shift gears to negative and positive predictive value. The positive predictive value or the chances that the patient has a MRSA pneumonia if the test comes back positive is only 35%, while the negative predictive value or the chances that the patient does not have MRSA pneumonia if the test comes back negative is over 99%. So the difference here between the positive predictive value and the negative predictive value and the sensitivity and the specificity is due to the base rate of MRSA pneumonia in this cohort or due to the small number of total cases of MRSA pneumonia. Understanding this point is important for considering how best to apply this test in the clinical setting. It's great as a negative predictor, but it's not as useful as a positive predictor. But the performance of the test will change if it's applied to a cohort with a higher base rate or a higher prevalence of MRSA infections. These same considerations will apply to your prediction models. One more important measure of model performance that we're going to discuss is the area under the receiver operating characteristic or the AUROC. This is a graph that plots the true positive rate or the sensitivity against the false positive rate or one minus the specificity. And the idea here is that there's always a trade-off between specificity and sensitivity. So for any model, as you increase the accepted false positive rate, the true positives captured will also increase. Or to state that another way, you will gain more sensitivity as you sacrifice specificity. However, an ideal model would have high sensitivity even at high levels of specificity. So what we want to find is a model that optimizes performance by optimizing sensitivity across that range. A random classifier, as you can see plotted here as that dotted red line, will perfectly balance true positives and false positives. So a false positive rate of 20% will correspond to a true positive rate of 20%, false positive rate of 70% will lead to a true positive rate of 70% and so on. This leads to a 45-degree line with an area under the receiver operating characteristic of 0.5 or half of the total graph area. As we move the line further and further to the left away from that random classifier, we can see that performance is improving until we reach a perfect classifier, which has a perfect true positive rate regardless of the accepted false positive rate. And we'll have an AUROC of one, so that will encompass the entire graph area. Models encompassing a greater area under the receiver operating characteristic represent superior model performance. Here we're going to introduce some terminology related to different types of regression modeling. There are some here that should sound familiar. So linear regression is the simplest and models the relationship between the independent and dependent variable as a straight line. Multiple linear regression extends this principle and allows for modeling of a dependent variable based on multiple independent variables. Logistic regression is used to predict the probability of a binary outcome based on one or more predictor variables. You can see some other types of regression that add penalties based on different factors associated with the model to improve performance and reduce overfitting. Bayesian regression incorporates prior beliefs about model parameters and then updates those beliefs based on observed data. And then support vector machine regression is a type of supervised machine learning that can be used for both classification and regression tasks. So with that, we're going to talk about machine learning a little bit in this section. And as we talk about machine learning, one of the important concepts to understand and to help us interpret how the machine learning algorithm was developed is the difference between supervised and unsupervised machine learning. So what we have on this slide is an example of supervised machine learning, a very simple example. So what we have here is input data of stars, as well as annotation by a human trainer for this model that tells the model these are stars. So the model is now able to recognize these tagged data elements for what they are. And via that annotation, learns the different characteristics that make something a star. So then as we ask the question, if we feed it more data and say, is this a star? The machine is able to identify that data as a star based on the human training that that model has received. So like I said, this is a very simple model, but you can imagine how this might be applied to other situations, such as chest x-rays, EKGs, where a model is fed various EKGs that say this represents a STEMI. And then that model will be able to identify STEMIs on future EKGs. The alternative to supervised machine learning is unsupervised machine learning, in which a model learns from input data that is not annotated by human trainers. The unsupervised nature of this type of modeling allows the machine learning model to independently identify patterns in the data, which means that that model may identify patterns that humans are not able to identify. But explanation of those patterns may be limited by the black box effect of unsupervised model generation. And so we may not be able to determine how the machine gets to the outputs that it's giving us. Some other core predictions – sorry, core principles for prediction modeling include data quality and preparation. So ensuring high-quality data will help you have a high-quality model, and preparing that data for model input makes sure that the model is able to process that data. Appropriate feature selection – so making sure you know what elements need to go into your model, making sure you have all the important elements for predicting an outcome. Model selection – so here is just making sure that you have the right model based on the data and the outcomes that you're looking for. Thinking about the different evaluation metrics that are important for you – is sensitivity going to be the most important for the outcome that you're predicting, or do you really want to focus on positive or negative predictive value? Time series considerations and mortal time bias, including variables in the future – so just remembering that when you have a model, you're only going to be able to feed it data from the present time point. So if you're feeding your training model future data, that is not going to be useful for prediction. Model validation – so making sure that it works on various datasets, external datasets in particular. Feature importance and interpretability. Updating and retraining – so as things change, making sure that you change your model with that. Domain knowledge and integration, and then just remembering that prediction and causation or correlation and causation are not the same thing. There are some limitations of regression modeling, and we're going to go through some of those here. Most of these are based on some assumptions that we make as we are doing regression modeling. The first issue is that the model that you choose may not fit the shape of the data. So for example, if you're using a linear model to model non-linear data, that violates the assumption of linearity, and you're not going to get a good prediction from that model. The second assumption is homoscedasticity – sorry, that's a mouthful – which essentially refers to the spread of the data across various values of the predictor variable. And so if you have more variance in the data at certain values of the predictor variable, that means that you do not have homogeneity in your variance, and what that leads to is a model that is biased. And so finding different ways to model that data might improve performance. Another issue is multivariate normality, and so assuming that all the variance is normal and that's not the case, again, can lead to errors and bias in your model. The assumption of independence, if that is violated by having variables in there that are not truly independent of each other, that can lead to model bias. Lack of multicollinearity is another assumption that can be violated. So imagine building a model with inputs of acute kidney injury and renal replacement therapy. Those things might not be completely independent of each other. They may be collinear. Or age and weight in children – as children get older, they tend to also get larger, and so those variables will also be collinear. So making sure that you don't have multicollinearity in your model. And then finally, outliers – obviously, significant outliers can contribute to bias and compromise model fit. All righty. So now we're going to move into the challenges of ICU data. So one of the things with ICU data is that there's a lot of it. So if you think about just a blood pressure reading happening on an A-line, you could have multiple blood readings in an hour times 24 hours times multiple days. So now you add in the fact that we're not just doing a Chem 7, but probably adding MAG and FOS, we're getting blood gases, et cetera. You can end up with datasets that are so big, Excel can't open them, and you need to be moving into R and SPSS and all of these other places. So you have a ton of data that is all relevant to some extent. So the important component here is that it does influence life-critical decisions. I was on a call just the other day, and they were asking me if I felt like we had to look at the MAP and the systolic. And I was like, well, yes, we do look at MAP and systolic. Even though they seem similar to someone who is maybe not working in the ICU as a clinician, you would know that sometimes those are different but still relevant values that are going to influence the type of decisions that we're making. On top of this, it is what we call nonlinear in nature. So it's not necessarily something that you can see very nice patterns in. It could be multimodal. On top of that, you can have numerical data, you can have time series data, as well as progress notes and everything else. And so the result of this is that you have very large, messy datasets. And this is very inherent to ICU data. Just a use case example to kind of give you guys a flavor for this. I have a graduate student that I've been working with who started off, he knew absolutely nothing about medications, and now I am very proud when he can pronounce vasopressor correctly and everything else in between. And this screenshot here is actually part of his oral written exam for his PhD qualifying exam. And one of the parts that he was talking about was just the sheer volume of data that we work with. And in particular, some of the stuff that my team and I are working on has to do with medication data. And so there is a side of us that wants to say, oh, I think we can just know about cefepime. We don't necessarily have to think about the dose or the frequency and how often it was given and when it was given. But those are all relevant variables too. And so not only if you were to take a look at a medication regimen that has 13 to 20 drugs on it, which is already quite a few variables, if you were to add in the dose, the frequency, the timing, as well as the patient's renal function, you can see how you have this exponential increase within the amount of variables that are in the system. So what happens with this is that as much as our inferential and traditional statistics are very powerful and can help us learn things within the ICU, there are areas where it would be nice if we had other methods. And so what I really liked about this study is that it kind of shows you how machine learning and traditional inferential stats can be a little bit different in what they show. So in this study, we had a few hundred ICU patients and we were looking at if we could predict fluid overload. And we had a group of 20 plus different variables that we were checking out. And when we did regression analysis, we found a number of variables that were significantly associated with fluid overload, things you might expect like SOFA score and age and things like that. But then, and then not shockingly, in univariate regression, you saw that medications were related to fluid overload. So more vasopressors increased your chances of having fluid overload as well as number of continuous infusions. And all of this makes total sense in the way that drugs are a key source of fluid overload within these patients, as well as potentially indicators of overall functional status or critical illness. But then in multivariate regression, that significance went away. And that felt very strange to us in that how could the source or the cause of the outcome being fluid overload not be significant. But that can happen for a number of reasons within a regression analysis. But then what was fascinating is that when we did some machine learning based analysis, we found that when we did a feature importance graph, which is this graph that you see on the right with the red and green and blue bars. And this shows you what features were the most important in this algorithm for being able to create the prediction model. And what we saw was that the medication regimen complexity score, which is basically saying how many drugs were on there, many of which are continuous infusions, was related as well as the number of continuous infusions. And this, I think, kind of makes sense as a clinician to say, okay, yes, those are two direct causes of fluid overload. And now they are showing up on feature importance. And then when we looked at the performance, you could see that if you looked at the AROC curve, that one of the supervised machine learning algorithms was of the higher in performance, not a ton higher, but nonetheless very present. And so what this speaks to is that potentially machine learning and AI offers us a different lens or window into that same set of data and potentially provides us different ways of analyzing this data. So now we have a poll question here of which of the following are limitations of traditional regression. So we have A is skewed by outliers, assumption of linearity, interpretability, or A and B being the first two options that are available to you guys. All righty, so D is absolutely correct in that the biggest issue that we struggle with with linear regression and logistic regression is that at some point, we are assuming a degree of linearity, as well as we risk having issues with being skewed by outliers. All right, so the question becomes, how can we use AI and how is it being used in the ICU setting? So there are a couple of different papers that have, or studies that have come out looking at how we can use this. So one of them was a prospective multicenter study of the TRU's machine learning early warning system for sepsis. And what's interesting about these different systems, there's another one for clinical deterioration that was published in the Nature Journal. There's another one for cardiac collapse. And they're really very neat in their ability to correctly identify patients that are going to require intervention in terms of either sepsis, cardiac collapse, et cetera. What is interesting about them is despite maybe the statistical modeling power of these studies, what was fascinating in the systematic review is that they actually said, well, we're not sure if it's actually affecting patient care. And I think for the clinicians on the call, this is a very important thing to understand is that right now there can be a lot of, there's a lot of excitement around AI and machine learning. You're seeing ton of new papers that are coming out, machine learning prediction this, AI based this. And I think the key question that you can keep asking is how did this actually affect patient care? So I was in a really interesting conversation maybe about a month ago now talking about this early sepsis warning system using machine learning that they implemented. And at first they saw no differences in outcomes. And actually the clinicians were kind of annoyed because it was just one more pop-up that was coming up. And so what they ended up doing is moving it over to kind of a tele ICU setting where they had trained physicians and nurses that were on staff who would look at the alerts. And then they would go look into the patients and say, oh yeah, okay, we believe this. We need to call the bedside clinician, nurse, et cetera to make an intervention and do something. And since they have started that component, they've actually seen reductions in mortality. And they think they've saved like thousands of patients with this. But I think this really speaks to a part that just because you have a good model and it predicts something doesn't necessarily mean that it's affecting outcomes. So the key question to some extent is what is the implementation science of this and what has truly been shown thus far? So this speaks a little bit to the importance of external validation. So within the concept of a model and a training model and then our validation model, so to some extent your model is learning everything based on that original data set. And so the question becomes how good is your data set and how representative is that data set of your patient population or any of the relevant patient populations. And there is again right now, because there is so much excitement around machine learning and AI, you will see a lot of studies coming out that have their single center studies. Even that fluid overload paper I just showed you, although I find it exciting from a methodologic standpoint and to show a particular point, it is a single center study. And so the problem is that a lot of models, once they are subjected to a more rigorous approach that involves external validation, they're not necessarily as useful. Another kind of funny anecdote, but I think is relevant is when I was at SCCM, this last year, I was part of the snapshot theater and listening to a really, really cool machine learning based algorithm for predicting if you were gonna be re-intubated. And they had something like 50,000 patients, just an astronomical amount of patients and like 10,000 different variables that they were pulling in, which is just so much more than you could imagine that linear regression, religious regression could ever do, much less a person just trying to look at all that information. And they had come up with a really cool prediction model for if you were gonna be re-intubated. And I asked the question of, well, how does it perform against the RSBI? And they said, what's the RSBI? And I was like, yeah, Tobin index, it's like the standard clinical benchmark that we use to know if someone's at risk of being re-intubated. And it became clear that they didn't know that that existed already, much less having compared it. And I think that that is a really important lesson and part of, I think one of the things that as a clinician, you can really look for is just because someone has made something really cool and complicated and use this fancy way of doing things, doesn't necessarily mean it's better than what's already being done. So one of the things when you're kind of reviewing different studies is can you look for how it's benchmarking against existing standards? And can the authors speak to that? Maybe there is no existing standard for that particular prediction event, but if there is, we should be speaking to that. This moves into our second objective, which is discussing advantages and key concepts for assessing prediction modeling and approaches on machine learning. This was a tweet that I found quite humorous and it was talking about this, the rebranding of linear algebra as AI is one of the most successful marketing campaigns. I've also seen another one about the rebranding of actually logistic regression as machine learning being a very successful marketing campaign. And the reason why I bring this up is again, I think that there is so much excitement that we're seeing AI-based this or AI-based that. And so it sounds like, oh, this is something we really need to be doing and really, it's gonna change care. And I do believe it's going to change care, but I think what's important is we still have to look under the hood a little bit and say, what are you truly doing? Is this different than how we have been basically doing data analysis for the last, however many years and what exactly are these models and how rigorously have they been developed? So what are some of the advantages for AI in the ICU? So as I mentioned, ICU data is heterogeneous, it's complex, it's often nonlinear. And AI has a big advantage in its ability to manage nonlinear patterns, non-monotone relationships, so things that have changing patterns over time. It also is capable of seeing things that maybe the human eye is not necessarily seeing. So a degree of hypothesis generating side to it, that can be very helpful. And again, it has some big data capabilities that oftentimes we're finding that like more traditional regression, just it breaks down at that point. So that graduate student I was talking about that had the 99 factorial medications, part of his PhD thesis is basically gonna be looking at how can we use traditional stats? How can it manage such large volumes of variables? And it's really interesting to realize that this is truly an area of study and improvement. This is a really nice kind of comparison of AI versus traditional inference. And you can go through this table here, but I think that some of the kind of the key points that you can look at are considering how within AI, it has the ability to use different types of data. So images as well as numerical data sets. There's a new thing called Hugging GPT that's come out Hugging GPT, what it's claimed to fame is using multiple large language models that have different capabilities that are all coming together to solve a particular problem. So you might have a large language model that's very skilled in image analysis, another large language model that's very good at text summarization, all going to summarize some particular or solve some particular healthcare related problem. There are certain issues in my opinion with AI in terms of proprietary algorithms not being necessarily available, how do we do data reporting? So thinking about is this an algorithm that you can find on GitHub, are they willing to share that type of code? Those are gonna be things that I think we really need to be able to see and understand of how did those models come to be. I also think there's the classic thing of meta-analyses of junk in is to junk out. I think this is still very true in artificial intelligence. So if you have a data set that's not representative of your population, a data set that contains bias, the model is going to take that and learn from that and might not be exactly what you're hoping for. I do think the pattern recognition component is one of the more exciting elements of where artificial intelligence can help us in the ICU. This was an interesting study where essentially we were looking at if we could use an unsupervised analysis of the medication administration record in the first 24 hours to be able to see different patterns. This is one of the first times that we've ever taken a look at the whole MAR for an ICU patient within that 24-hour period. And we tried to match this up against a common data model that was able to give us more information for those drugs. So it wasn't just that it was cefepine, but we would know all these different kind of components about cefepine, that it's really adjusted, that it's a beta-lactam, that it's an antibiotic, and so forth and be able to try to find different patterns. And so what this figure is showing here is that there are different distributions of different medication clusters and they have kind of different overlying patterns within the various clinical outcomes. So you could say here that patient cluster five had relatively speaking less serious outcomes compared to clusters one and four and then go take a look at the medication clusters and say, hey, does this make sense or what are we seeing within this? I think what's interesting about this is that these results did show that there are patterns of medication use that seem to relate to outcomes that are not necessarily something that the human eye can see. What I would say about this is that when we reviewed the medication clusters, we looked at them and said, these don't necessarily make clinical sense to us. That can mean a couple of different things, one of which could mean that to some extent an AI algorithm, if you download R and put some numbers in, you're gonna get something out regardless of its meaning. But on the other side, it might say, hey, there's a signal here that is worth unpacking in more detail. And so this begs the question of maybe 1,000 patients isn't enough and we need to have more patients and to have more finely tuned clusters and so forth. And so again, this is kind of getting towards can you find new patterns? What I think is really interesting about this is that this is really one of the first times that this degree of granularity within medication data has ever been even attempted to be looked at. A kind of an iteration of that concept has been repeated with the outcome being fluid overload, looking at the medication administration record in the first 72 hours of an ICU patient stay. So these very colorful graphs that have very small, hard to read fonts, this is actually showing you all the different drugs that patients received in the first 72 hours of their ICU stay. And you can see the timing or temporal component as well as what the drugs were. And then there was an unsupervised analysis. So essentially unsupervised is just looking in the data and saying, hey, we see things that might be interesting, what do you think? And what it found was that a particular cluster of medications, cluster seven, seemed to be associated more frequently with fluid overload. And when we put this into a prediction model along with Apache score as well as diuretics, we found that it did improve that model. And that model was pretty good at predicting fluid overload. And so what this speaks to is, is there a way that someday we could be basically integrating different AI algorithms into the electronic health record with the goal that we're gonna be providing meaningful predictions that you can then act upon. So I think this is very interesting and exciting in terms of kind of the concept. What was nice about this study was that cluster seven did make a lot more sense than that first study I was showing you. It had a lot of IV drugs, not surprisingly, but it had a lot of things you would think of that have larger volumes of things like vancomycin and vasopressors. So to kind of summarize some of these concepts, when you're evaluating artificial intelligence or machine learning studies as a clinician, I think one of the key things you're gonna ask is the choice of algorithm. Did they pick this just kind of randomly? They wanted to see if that particular method worked, or are they able to say like, hey, we've seen that XGBoost works particularly well with this type of data, and so we're using it again. You ideally are looking for some type of rationale. What is the dataset quality? Again, junk in is to junk out. So looking at what's in there and then the level of granularity. One of the things that really struck me that was a really neat paper in Nature about cardiac collapse, and they included all sorts of variables, but one of the ones, they did include some drugs. And generally speaking, the drug categories made sense to me, but then one of the things they did is they made some kind of interesting assumptions about the half-life and period of activity of different drugs. And when I was looking at this supplemental file in the Excel doc, it made absolutely, like I couldn't necessarily find how they came up with those particular durations of action for ACE inhibitors versus propofol and so forth, and I was also left wondering like, why not all the other drugs that potentially affect cardiac collapse? So on the upside, the algorithm worked. It's very good at predicting cardiac collapse, but the downside is it doesn't necessarily have all the representative data points that we would think about, and is maybe either kind of over or under prioritizing things that we think are relevant. Again, does that AI model work on multiple sources of data? So you're really trying to ask, is this, what kind of external validity does this have? Does it meet the FAIR criteria of findable, accessible, interoperable, and reusable? And what type of common data models are supporting this? Sometimes it's okay if you see a single-center study where what they're really trying to emphasize is methods development. Hey, before we undertake a huge semester or two-year-long project taking this on, we just wanted to know, generally speaking, does this algorithm work for this type of data? There's definitely much to be said for that, but when you start thinking about clinical, we're gonna bring this into our hospital, you wanna make sure that they can speak to that. Understanding how the investigators clean the data is really important. We were in a conversation yesterday, and we were trying to figure out how to decide what is the, if we were gonna talk about people that have vasopressors versus did not have vasopressors, and look at fluids that they received prior to those pressors, what do we think is the time window? Is it three hours? Is it six hours? Is it 12 hours? Is it at the time of ICU admission? Is it at the time of ER admission? What if they were sitting in the ER for three days? How does, does that play into this? And I, at first, was like, maybe this is just me, but then I was in, I got another conversation recently, and someone's like, oh my gosh, trying to decide about just the onset of sepsis, and what we're gonna use as the definition for the onset of sepsis is such an area of controversy. Are you starting it at the point of giving antibiotics? Are you starting at the point of meeting sepsis three criteria? Where do pressors play into this? And so, really understanding how do people clean the data and define their endpoints can really make a big difference in terms of how you interpret that data. And again, the final kind of important parts is was it clinically implemented? Just because you have a cool concept doesn't mean that it necessarily works in the real world. The way I think about this is, you know, seatbelts. Seatbelts certainly save lives, but there was a long time when people didn't wear seatbelts. So you have to figure out how to bridge the gap of just because you have a cool toy or a cool thing, how do you get people to actually use it? So on the flip side, what are some of the limitations of AI in the ICU? To some extent, there's a black box effect. We're not even entirely sure how AI works to some extent. We just know that it does. So all of a sudden, you just kind of have these variables for this model that works. And that feature importance graph I showed is really neat to kind of see, okay, these were the things that were the most relevant. But to some extent, there's a side of magic to it that can make people a little bit more nervous. Even understanding how can you be sure, I was asking this question the other day about large language models. How can you be sure that it's going to pair the right dose with the right drug every time? Like, how is it always gonna know it's aspirin 81 or aspirin 325? And they're like, oh, it just does. And I was like, you can't program it to just know that those are the right answers. Like, you just have to trust it. And they were like, yeah. And it was interesting kind of being the pharmacist side to me versus the computer scientist. Their point was like, it's fine, it works. And I was like, I don't know. Like, that kind of makes me nervous that you can't guarantee this to me. Well, another issue is lack of high quality data sets. To some extent, when you start talking about the size and the granularity and having the correct timestamps and being able to pair that with progress notes and everything else, you're more or less trying to download the entire electronic medical record. And that is time consuming and expensive and hard to move around. You have people working behind firewalls and different things like that. So this is really kind of a known issue within critical care in general. Another component that is challenging for AI and various types of machine learning is the ability to manage temporal or trajectory-based data. How does it understand that 1 p.m. is before 2 p.m., which is before 3 p.m., and therefore, you're serum creatinine going up at every interval. How does it understand that trajectory? And that is a lot easier said than done. And there's a lot of kind of methods development around those types of questions. Another area within this is alphanumeric combinations. So can it handle, again, I like to use aspirin 81 or aspirin 325 versus acetaminophen 325. 325 is the same number for aspirin versus acetaminophen, but they mean different things in a clinician's mind, one being a higher dose and one being a lower dose. Or another way to think about it is acetaminophen 325 versus ibuprofen 200. Technically, 325 is more than 200 in a numerical sense, but we would consider them both starting doses. These are things that we can't necessarily reliably code in at scale right now. Here's our next polling question. What are some of the limitations of AI? Awesome. So yes, one of the biggest limitations is the interpretability of the algorithm itself. And does it have that clinical trust associated with it? One of the kind of key things that we talk about a lot is, have we thought about the concepts of human-machine teaming and brought people in at the various levels to really make sure that we feel good about the clinicians that are using this particular algorithm or model or product, if you want to think of it that way. So, as I mentioned, the implementation studies are essential. So you're really looking for people to move from methodologic development, which is model development, to silent pilots. They deployed it, and they just saw if it worked in a real-life setting, to actually robust implementation studies. And you're really looking to have people that have robust oversight of this. So, OK, you had a sepsis alert. You had 1,000 alerts go out, and we missed one. What does that mean? Or there was someone that we thought had sepsis and didn't, and we did all these things to them that didn't need to happen. What is the oversight that's going into those types of models and prediction alerts? Right now, within the ICU, it's very notable that only about 1% of all studies have gone through that type of implementation science. We have been studying fluid overload right now. We thought fluid overload was a nice, relevant thing. Lots of people have it. There's a possibility that we could prevent it with intervention, and therefore making early recognition important. What has been interesting to set this up is, again, you're going from that initial model development, trying to do multicenter and external validation, looking at, should we be trying an unsupervised versus a supervised machine learning approach, bringing in some other type of causal inference modeling, which can also have an AI side to it, before moving into, ideally, silent pilots and interventional analysis. I think what's important to realize is this takes time and even years to develop this in a meaningful way. I think with some of the rush of, let's try to get things out there, there's a side that we tend to miss these last two very key steps. With this, we're getting towards the end, and I want to be able to open this up for more questions. These slides will be available, and we can talk about this in more detail. To me, I think where I would like to see, and what I am excited about what AI is going to be able to do, is if it's going to be able to provide us quantitative and concrete predictions that you can make decisions off of. You have a patient, you're trying to decide if we should give fluids or not give fluids, and what does that mean? What if you had the ability to say, well, if I give fluids, it confers this risk of fluid overload versus this risk of prolonged mechanical ventilation. How would you maybe change your care as a result of that? I think our ability to think about AI providing us that type of assistance at real time at the bedside in a quantitative way is really the next step. These are the type of things that we just sit there on rounds and look at each other and go, I think this is what's going to happen, but what if we had a more numerically based way to think about that? With that, I'll open it up for any questions that you guys may have, and very excited to have you guys here today. Thank you, Dr. Sikora and Dr. Murray. I'll allow some time for people to start putting some questions in the chat box, but I'll kick things off. Dr. Sikora, you talked about specifically your fluid overload that you've been studying and things like that. Can you discuss any challenges you've faced whenever you've been doing this research with AI machine learning that might be different from more traditional methods of research you've done in the past? Sure. So many things that we have run into. I think one of the ones that's really interesting is how do you pre-process and clean the data in a way that is easy to interpret and to understand? Even something as simple as you want to include bicarbonate in your model, are you going to use the absolute value of 10, 20, 30? Are you going to do the low values or the high values? Even within ICU care, we tend to have two values that bother us. A white blood cell count less than four is a problem, but also greater than 14 is a problem. What do you do with between four and 14? Is that considered normal? Thinking about how to approach that in a systematic way such that you could say even like all of the studies that just my team has done, we're going to define this the same way every time, much less working with other institutions. I might have, let's say, a fluid overload model, and then you develop one. But if you've made different assumptions about the same type of data, it means that we maybe can't necessarily compare them entirely, or it's going to be how do you integrate that into a system? Epic might be very good at giving you a continuous value, but not as good at dichotomizing something. How do you think through some of those things? I guess one of the areas that we've run into is the concept of common data models, and how do we make everything we do very reproducible? That is not necessarily a super fun part. Of course, it's much more fun to get cool tables with significant p-values and things like that than sit there and think about, have I made this Excel doc very clean and easy to use for the future? Yes, those are all very interesting dilemmas and challenges to think about. We have another question in the chat here that says, how do you see SCCM helping develop the large data sets required for AI? I would love to take that. One of the coolest groups that I've gotten to be a part of recently is a discovery group called Data Outcomes and Definitions, and another one about data harmonization. There's actually going to be two different papers coming out by those groups trying to set some standards. It seems so simple in a way, but the one was basically like, what do we think should go in a demographics table for ICU studies? Age, sex, Apache score, what is truly essential? The idea behind that is then if we were all doing a good job, and by all, I mean members of SCCM doing research, much less doing research through SCCM, where we all had the same data we were consistently collecting, it would allow us to do so much better model validation and testing from that perspective. That's a really cool concept. Then you get into the idea of where do you store this data, and how do you transfer it safely? I had another conversation with this group, and they have a startup funded by NASA, National Science Foundation, looking at if they can use blockchain technology to make data sets safe. They're trying to show me the back end of their blockchain situation to make it HIPAA compatible. It was mind-blowing to me, but at the same time, those are the things that are really neat that I think SCCM is involved in, and I hope that they're going to do that. I guess what I would say is maybe helping us meet the FAIR criteria or FAIR standards for data sharing. I think that's all great and exciting. We have another question here that says, what type of disciplines do you have on your team to do these kinds of projects? Are you doing them with a small team of pharmacists? Brian and I are working on a team together that does have pharmacists on it, but it also has physicians. It has multiple computer scientists. It has biostatisticians. It's important to realize that all computer scientists are not made the same. One has a particular interest in causal inference methodology. Another one has interest in large language models. Another, I think of as someone who really understands critical care stuff. The two of them barely can pronounce basic drug names because they're just not familiar with it. The third can rattle off to use SEPSIS-3 criteria, despite having no formal ICU training. The same with statisticians. They all have different areas of interest within that. One is more epidemiology-based. One has more of a causal inference component and stuff like that. It's definitely a multidisciplinary, multi-professional team. That is great insight. Another question we have here is, how do we control for inaccurate, missing, or incomplete data for quality of data sets? Brian, I'm taking all the questions for you, but I'm going to leave this ventilator asynchrony one for you. Data missingness is a huge issue. How do you manage data missingness? One of the classic ones we talk about is even within SOFA score, if you don't have Glasgow Coma Scale available, how do you manage that? There are standardized ways. One of the things I remember from when I took a stats course, and I kind of hated the professor for it, but also respected him. He was very, very rigorous about – a lot of times, you'll see people, and they'll have the age, and it'll say – or maybe I'll do sex. 50 out of 100 people were – or 51 out of 100 were male or something like that. But then you really look up, and actually, the total number of patients in the study wasn't 100. It was 104. What it's kind of saying is, oh, we didn't actually know about four of the patients at all. His point is that at every table, you should be having the values of what you know, and then actually write out the missingness of that data. There were four that we don't know, or this was only out of 80 patients or something like that. Then to be very clear about if you're using multiple imputation or what type of missingness that you're dealing with. We have a very large multicenter study going on called Optive. Right now, we are paying a group to basically go through the data that's been collected and try to evaluate the missingness. It's interesting. We got a $30,000 invoice for that, to have someone rigorously look through all of this data and see if we can't reduce the missingness. It was kind of embarrassing, but I was one of the ones that had missing data. I just didn't do one of the forms for a couple of the days. Going back and trying to click through and do those things. It can be very well-meaning why that data is not there, but in the end, it can very much influence what's going on. I would say that's, again, not one of the super fun parts about big data, but is a very, very necessary thing. I think what you'll find is if you really read into some of the different papers, you'll see that they don't have very clear methods on that. I think if you're ever reviewing something like that, it's very reasonable to say, I want to have more understanding of how you managed missingness and how you did preprocessing and stuff like that. Yes, definitely sounds like an opportunity for us to use our critical thinking skills when analyzing these. And then I'll take it to Dr. Murray about the mathematical. Someone's asked about mathematical models to analyze ventilator asynchrony by AI. Yeah, there are a couple of different groups interested in this topic, including one here in Colorado, and it does seem like a perfect use case for AI and machine learning because there is a very, very large volume of data. I think even if you talk to those groups and individuals, they'll tell you that it's still in the early stages. The monitoring required is above and beyond what we are typically using in the ICU. So patients need typically esophageal balloon in addition to the ventilator waveforms. So it's very resource and time intensive. And we're still not really sure which specific types of ventilator to synchrony actually have an impact on patient outcomes. So which ones need to be intervened upon versus which ones are just artifacts? Nor do we know the appropriate ways to intervene on every type of ventilator asynchrony. So while this is an area of interest, it's still in the early investigative stages. Very interesting. Looking forward for more to come with that. And then the last question we have here today says causal inference was mentioned a few times. This could be considered a road label. What methods are you referring to when implementing causal inference methods by a predictive model or what methods have you used? Yeah, that is a great point. So causal inference is kind of a very broad bucket of things that we are going to be wanting to do. I think one of the things that we're starting to look at is trying to even understand. So we were working on a study with heart rate variability and how can we use heart rate variability to improve sepsis prediction. And one of the questions that came up was, is heart rate variability a cause of mortality or is an indicator of critical illness that is then related to mortality? And this was kind of an interesting to me area to kind of look at. And so, you know, drawing the DAG for that or thinking about that. The very initial study that I was kind of referring to very briefly within the fluid overload space is the first study that we're talking about is just propensity score matching. So nothing super fancy at this point. But we this is a I say nothing super fancy just yet. Just sitting in my in my inbox of thinking about what we should do to kind of advance those methods and think about it. And the one computer scientist keeps asking when I'm going to give him a good data set and he can do some fun machine learning stuff with it. And I just said, OK, maybe after this call. Great, thank you so much, Dr. Sikor and Dr. Murray for your time and insight. That concludes our Q&A session for today. And thank you for the audience for attending. Please note the next upcoming webcasts are listed here. Again, this webcast is being recorded and the recording will be available to registered attendees within five to seven business days. Log into MySACM.org and navigate to the My Learning tab to access the recording. And that concludes our presentation today. Thank you, everyone.
Video Summary
In the webcast titled "The Role of AI in Machine Learning and Prediction Modeling," Megan Zingler, a clinical pharmacy specialist, moderates a discussion focusing on the application and future of AI in critical care, particularly in prediction modeling. The webcast offers insights into the strengths and limitations of traditional regression-based methods compared to AI and machine learning approaches.<br /><br />Speakers Andrea Sikora and Brian Murray from the Universities of Georgia and Colorado, respectively, elucidate three key objectives:<br />1. Reviewing the strengths and limitations of traditional medicine-based modeling.<br />2. Discussing prediction modeling approaches based on machine learning.<br />3. Exploring AI algorithms for intensive care unit (ICU) event prediction.<br /><br />The presenters illustrate the utility of AI through examples such as sepsis early warning systems and the Epic Deterioration Index, emphasizing how AI can predict critical events like ICU transfers and patient deterioration. They also highlight the importance of external validation of models and ensuring they are thoroughly tested against established clinical practices.<br /><br />Key principles in evaluating AI performance are explained, including model sensitivity, specificity, and the area under the receiver operating characteristic (AUROC). Additionally, the importance of understanding and handling large, complex data sets typically found in ICU settings is underscored, along with the necessity of interdisciplinary collaboration involving computer scientists, biostatisticians, and clinicians.<br /><br />The webcast concludes with a discussion on the challenges of implementing AI, such as data quality and interpretability, and the critical need for implementation science to study how AI tools impact patient care in real-world settings. Questions from the audience further engaged with topics like data preprocessing, multidisciplinary teamwork, and causal inference.
Asset Subtitle
Professional Development and Education, 2024
Asset Caption
Review recent computing trends with artificial intelligence (AI), machine learning (ML), and natural language processing as they relate to research, education, and clinical practice in the intensive care unit. This webcast explores how to appraise this technology, the prose and cons of AI, the potential for biased algorithms based on training data, appropriate critiques of AI/ML methods, and future methodologic advances.
Learning Objectives
Review strengths and limitations of traditional regression-based prediction modeling
Discuss advantages and key concepts of assessing prediction modeling approaches based on machine learning
Explore use cases of artificial intelligence algorithms for intensive care unit event prediction
Meta Tag
Content Type
Webcast
Knowledge Area
Professional Development and Education
Membership Level
Professional
Membership Level
Select
Tag
Innovation
Tag
Professional Development
Year
2024
Keywords
Webcast
Professional Development and Education
Innovation
Professional Development
2024
Professional
Select
AI in healthcare
prediction modeling
machine learning
critical care
sepsis early warning
Epic Deterioration Index
model validation
ICU event prediction
interdisciplinary collaboration
implementation science
Society of Critical Care Medicine
500 Midway Drive
Mount Prospect,
IL 60056 USA
Phone: +1 847 827-6888
Fax: +1 847 439-7226
Email:
support@sccm.org
Contact Us
About SCCM
Newsroom
Advertising & Sponsorship
DONATE
MySCCM
LearnICU
Patients & Families
Surviving Sepsis Campaign
Critical Care Societies Collaborative
GET OUR NEWSLETTER
© Society of Critical Care Medicine. All rights reserved. |
Privacy Statement
|
Terms & Conditions
The Society of Critical Care Medicine, SCCM, and Critical Care Congress are registered trademarks of the Society of Critical Care Medicine.
×
Please select your language
1
English