false
Catalog
SCCM Resource Library
Deploying AI for Critical Care: Learning to Make a ...
Deploying AI for Critical Care: Learning to Make a Difference
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Great, thank you very much, and thanks to the organizers for allowing me to speak today. So these are my disclosures. In addition, my research funding, which includes NIH and also a grant from the Department of Defense. So my goals in my talk today are to first discuss considerations when taking machine learning models to the bedside. And these are going to be general ideas and frameworks that you could apply to any type of machine learning algorithm that you might want to implement. I'm also, along the way, going to provide some specific examples from our group where we've implemented an early warning score for clinical deterioration in several health systems. So if you look in the literature, you can see that there are models everywhere. And in fact, you see this also, I think, at the conference and at many of these conferences now where machine learning models are becoming more and more popular. Their growth is exponential. But unfortunately, despite the fact that we're developing more and more models, they are not being implemented clinically. So I think this article summarized it nicely where they say the typical life cycle of an algorithm is to train on historical data, publish a good receiver operator curve, and then it collects dust in the model graveyard. This is showing the similar data in a different way where if you look at the very top, the circles are the number of publications related to model development. And as you go down on the graph, the clinical outcome evaluation is where we really want to go. But unfortunately, what we're really seeing is mostly more and more models being developed, but again, very few implemented clinically. So what are some of the barriers to model implementation? Well, I changed this slide recently because ChatGPT came out, and who better to ask about the barriers to artificial intelligence implementation than an artificial intelligence algorithm that has been implemented on the web? So according to ChatGPT, some of the barriers include data privacy and security, data quality and availability. Restorate rate is always 20 in almost all of our patients until it's not, and then they're in trouble. The data are often also siloed. We have data in our ventilators, in our telemetry, from other different devices that may not interface well with the electronic health record. Furthermore, lack of standardization is a big problem. Even if you're on the same electronic health record company, if you try to implement your model in other health systems, you'll find that their variable names and other data are actually very, very different, and it can become very challenging to implement these systems across a wide range of healthcare systems. Ethical considerations are also very important, such as bias and fairness of your models. And when you look at models that are getting closer to human-level expert accuracy, a lot of those models are very, very difficult to interpret. So model interpretability can be challenging, especially in clinical scenarios where some of these can be life and death decisions. Finally, adoption and integration are a challenge. There are challenges with trust of the algorithm. Integration, how are you actually going to implement your model in the workflow for your clinicians can be a challenge as well. One of the additional things that I had in my earlier version of this slide before CHAT-GPT was financial considerations. So it can actually be quite expensive to get the personnel needed and develop the graphical user interface and then implement these models as well. So there are a lot of barriers to implementing AI algorithms in healthcare. So with that, though, groups have started to develop these frameworks to help us better be prepared to implement these models. For example, Verma and colleagues presented this framework from a paper a couple years ago now where they discussed first the exploration phase, which oftentimes can be the most important phase. You need to identify the problem and identify the gaps in care and figure out can an AI algorithm actually help. You then want to establish a multidisciplinary team, understand the workflow, and really envision the future state where the algorithm is working alongside the clinicians and try to figure out how to best get there. Then and only then, when you think this is actually feasible and potentially helpful, then you move on to the machine learning design and testing phase, which include testing in a silent environment alongside what the clinicians are doing to make sure, again, that it adds value. And finally, you're then implementing and evaluating and potentially continuously evaluating your system in a learning healthcare system-like framework. So for the rest of my talk today, I'm gonna focus on a couple key aspects of this framework that I think are very relevant to implementing machine learning algorithms successfully in practice. So I think first and maybe the most important thing is to actually ensure that your model is going to add value to clinical practice. It's very easy to go and download MIMIC and develop a model that predicts mortality with an AUC of 0.9, but you need to ask yourself and your clinicians, is this actually gonna help you treat your patients better in some way? And so the model should really focus on a gap or an inefficiency in care. And the predictions need to be more accurate than current practice, right? If your clinicians already know what to do or they're already giving patients early antibiotics, an algorithm to say, hey, give early antibiotics may not be helpful if they're already doing it correctly. In addition, when you turn the model on in silent testing, you wanna make sure that it continues to be accurate. Sometimes what you'll see is a model could look very accurate in retrospective data, but then when you run it prospectively, the model breaks down. So with our clinical task of interest was to identify patients who are at risk of clinical deterioration, which we defined as need for ICU transfer, cardiac arrest or death for patients who are outside the ICU and we had a wrapper response team in our hospital. And so we wanted to see and understand was how was our wrapper response team doing for identifying these events and could an algorithm we developed actually identify these patients earlier and more accurately? So here, this is the percent of events identified along the Y-axis. So again, ICU transfer, cardiac arrest or death. X-axis is the hours before the event. And unfortunately, what we found was that our wrapper response team was getting called about an hour before the event happened. So not nearly enough time to actually come in and intervene and potentially prevent these events from happening. So because of this, we developed a machine learning algorithm, which we call eCART because they call Dr. CART overhead when a cardiac arrest occurs in our hospital. And what we found was that our algorithm when and during silent testing was able to identify these patients a median of about 30 hours prior to when the wrapper response team was being called. If you think about how long it takes for you to go see the patient, call the consults, run some tests, start some antibiotics or other interventions and hopefully those things will start making some improvements. Having a little more than a day lead time, we were very excited to see. So the next thing to determine if your model appears to be accurate is to figure out how would you like your model to actually run in clinical practice. So you could potentially run it within the EHR platform itself via the EHR vendor. You could also integrate it into the EHR with a third party vendor. In addition, you could potentially develop an app on your phone or you can develop an app for some other device where you have to log into a separate system. And this decision will impact the downstream flexibility of your algorithm, including the user interface because some EHR vendors have certain stipulations for what your user interface may look like. For example, maybe you can only have BPA show up or other things like that. So this decision is really important and the workflow is also important. So what we did initially was we developed through the University of Chicago a iPad app for this. But what we found was that you had to log in separately from the HR and people actually would rather integrate it within the EHR because then they're at the same place where they're taking care of their patients. They don't have to log into a separate system. And we found that utilization was much, much higher when we implemented it as an integrated system within the EHR. So next, once you figure out how you're gonna run your algorithm, you need to think about how do you want your users to interact with the algorithm. So when you talk about graphical user interface design, it's again often limited by the technology. It's important if you can to collaborate with human factors experts because you can do things like iterative redesign of the graphical user interface. And there's actually quite a robust literature along in terms of some of the things you might wanna show to your clinicians. In addition, displaying the model variables, predictor explanations, and trends over time are often important. These are things that we found that our users really like. Another thing to consider also is that FDA and others may have some rules related to what you have to show clinicians in terms of explainability of your algorithm. So then going from the model to the GUI, this is an example of what our model looks like integrated in the EHR. So you can see in the center is the trend over time of the score. So a patient gets into the yellow zone and then the red zone, we're then suggesting that they do certain things for their patients. We also have all the variables in one place that are in the model. And then the ones that are actually driving the high score are highlighted. So now you know why you're getting called for this patient and you don't have to click five different places to find all the different variables. In addition, I think it's important to understand that just showing the score itself will not improve outcomes in the vast majority of cases. For example, this is a study where they, in a similar type of algorithm, where they were looking to use an early warning score and they showed the early warning score to the nursing staff. And what they found was that just showing the score itself without any guidance about what to do next did not improve outcomes. So then what you need to do next is to think about workflow analysis and usability. Now how many of you in the room have these BPAs that pop up because your patient might be septic? Like all the time, they are always septic. You know, you walk up a flight of stairs, you might be flagging for sepsis. So you really need to think about, you know, understanding what your users really need and what they actually will wanna use. And so when thinking through workflow analysis and usability, again, you understand, you know, when they actually need the model output, at what point during their workflow. The model output should drive specific actions. And again, those actions should be driven by what were the gaps in care you identified at the very beginning of this process. The clinical decision support tool should make clinicians' work life easier and not more difficult. Again, by having something that's integrated so that it's right in front of them, that makes it, for example, we do one-click ordering of all the orders within the workflow itself. So now the clinicians, again, don't have to click to a bunch of places for their labs and vitals and other information, and they don't have to click all these different places when they're putting in the orders. It's all in one place. So this is an example of what our pathways look like. So one of the things we do is for high-risk patients we're screening for sepsis. You could potentially then activate the rapid response team within the workflow. You can also order lactates with one-click as well. And with that, we also are collecting information on how often the clinicians are actually worried about the patient, because that may be something important as we think about improving the algorithm in the future. So most improvement that we've seen has come from combining both the score and then the workflows with the suggested actions. So for example, when we implemented this in a four-hospital healthcare system in the Midwest, we found that by pre versus just showing the score itself without the workflows, we found a small difference in sepsis-related mortality. However, when we actually then added in the workflows with the suggested actions, as well as the one-click ordering, that we had a significant 35% decrease in overall sepsis mortality from baseline and a number needed to treat of 21 to save one life. In addition, we also had a 25% decrease in overall mortality in those same units. So it wasn't just that they were identifying sepsis itself more often, for example, in less severe cases, but actually overall mortality also decreased. So when you're then finally thinking about what are the outcomes and process metrics that you might wanna study, I think that first and foremost, you wanna be able to measure the usage of the tool. If you don't find a difference, it might be because no one's using your tool, right? So it'd be important to understand that. In addition, identifying the process metrics that may drive the change in the outcome is also important. And then if you can, try to collect as much of this information as possible within the EHR. So again, measuring utilization's important. So we do this automatically with our tool. And you can see, this is just the implementation over the first few months at a hospital. And you can see for our red patients, the utilization of the tool is well above 90%, approaching 100% for the very sickest patients. And again, if you compare that to probably the 95% of ignore button that you're hitting for your BPAs, we were very excited to see that. And then finally, when you're thinking about the study design, when you're looking at outcomes, this is also very, I think, important, but sometimes a challenging decision as well. So of course, randomized trials, controlled trials are considered the gold standard. Step wedge design is also another to consider. Regression discontinuity, which I'm not gonna talk about here, but it's something you can consider if you have a risk score where you're activating at a certain threshold. Interrupted time series is something that probably one of the more common ways of evaluating these systems. So for randomized controlled trials, they can be challenging with these types of systems. One is that there's the high cost to create the tool, to create the model, to get the model running and the implementation, and you finally convince the C-suite that this model's amazing, we should turn it on, and you say, oh wait, one last thing, we just wanna turn it on for half the patients, and you're like, well, we just put all this money in and you made us believe it, and we believe in it now, so now we're not gonna use it for half the patients in the hospital, I'm not sure about that. So that can be one of the barriers. Again, why studying thing, again, you convince them this is a common sense intervention, you're identifying critically ill patients earlier, or identifying septic patients for antibiotic use, doesn't that seem like common sense? Do we really need to randomize? Contamination from the training effect can be important, right, so as you learn about the model, the important variables in the model, that can then make you think about sepsis more or deterioration more, and then maybe that could contaminate the effect that you're seeing in the control group. And then also, how are you gonna randomize? Are you gonna randomize by patient, by clinician, by ward, by hospital? Again, these can be challenging decisions. Interrupted time series, as I mentioned, is probably one of the more commonly used approaches for this because of some of the challenges I mentioned earlier. So this is from our publication from last year where we implemented our ECART score, and we can see on the baseline period, you can essentially get a trend line. You then get the immediate shift here with mortality going down after implementation, and then you get to also study what sort of the longer term learning effect, if you will, of continuing to use the tool looks like. So if you look in the literature, even though there's only few examples of these tools over the last few years that have been implemented, there's some early signals that machine learning tools can improve outcomes. I think one of the landmark studies was the one by Gabriel Escobar and colleagues in the New England Journal from 2021 where they implemented a similar tool in a step wedge trial design at Kaiser, and then mortality within 30 days of an alert was lower in the intervention cohort compared to the comparison cohort. In addition, the study that I mentioned earlier from our group, we also showed a very similar number needed to treat and effect size with our score as well with a significant effect both in a before and after design as well as the interrupted time series adjustments. So in conclusion, many machine learning models are developed every single year, but few have been deployed for clinical care. Multidisciplinary engagement, ensuring that the model adds value and thoughtful user interface design are critical, and the accurate machine learning models prompting the right actions for the right patients at the right time can potentially improve outcomes for our patients. So I wanna acknowledge our team, and thank you all for your attention today.
Video Summary
In this talk, the speaker discusses the challenges and considerations when implementing machine learning models in clinical care. Although there is a growing number of machine learning models being developed, very few are actually being implemented clinically. The speaker highlights several barriers to implementation, including data privacy and security, data quality and availability, lack of standardization, ethical considerations, model interpretability, and adoption and integration challenges. To address these challenges, the speaker presents a framework for successful implementation, which involves exploring the problem, establishing a multidisciplinary team, designing and testing the machine learning model, and implementing and evaluating the system. The speaker also emphasizes the importance of ensuring that the model adds value to clinical practice, understanding the workflow and usability of the model, and measuring usage and outcomes. The talk concludes by suggesting that accurate machine learning models that prompt the right actions at the right time have the potential to improve patient outcomes.
Asset Subtitle
Professional Development and Education, 2023
Asset Caption
Type: year in review | Year in Review: Anesthesiology (SessionID 2000001)
Meta Tag
Content Type
Presentation
Knowledge Area
Professional Development and Education
Membership Level
Professional
Membership Level
Select
Tag
Innovation
Year
2023
Keywords
machine learning models
clinical care
implementation challenges
data privacy and security
improve patient outcomes
Society of Critical Care Medicine
500 Midway Drive
Mount Prospect,
IL 60056 USA
Phone: +1 847 827-6888
Fax: +1 847 439-7226
Email:
support@sccm.org
Contact Us
About SCCM
Newsroom
Advertising & Sponsorship
DONATE
MySCCM
LearnICU
Patients & Families
Surviving Sepsis Campaign
Critical Care Societies Collaborative
GET OUR NEWSLETTER
© Society of Critical Care Medicine. All rights reserved. |
Privacy Statement
|
Terms & Conditions
The Society of Critical Care Medicine, SCCM, and Critical Care Congress are registered trademarks of the Society of Critical Care Medicine.
×
Please select your language
1
English