false
Catalog
SCCM Resource Library
Thought Leader: Data Science and Critical Care
Thought Leader: Data Science and Critical Care
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
I'd like to give you a quick background on him. Dr. Claremont is a professor of critical care medicine, mathematics, chemical engineering, and industrial engineering at the University of Pittsburgh. He is a co-founder for the Society of Complex Illness and co-chair of the Joint Task Force in Data Science. His research, funded by NIH, NSF, and DoD, explores engineering and data science approaches to disease pathology, such as multiple organ failure, event forecasting, and treatment optimization. His most recent work focuses on data science methodology, including the integration of human knowledge, mechanistic physiology, and artificial intelligence. Dr. Claremont, we look forward to hearing your talk. Thank you, Dr. Wong, for this very kind introduction. I would also like to thank the organizers for the privilege of presenting as this top leader session on data science and critical care. You are hearing of data science as a general term. You are also hearing of artificial intelligence and machine learning. And globally, data science is a multidisciplinary field that encompasses AI, and which itself encompasses machine learning as a set of tools to extract value from data. So the purposes of data science, AI, and machine learning is essentially to extract value in the form of diagnosis. And for example, clustering analysis is an art of diagnosing. For predictive purposes, what's gonna happen in the future given the current situation? Or prescriptive purposes, which is what should I do medically, clinically, given the data that I currently have? So you're all aware of this famous Dunning-Kruger effect from Psychology 101, where as you start or are introduced to a new discipline, then you instantly believe that you know everything there is to be known about that discipline. And I will, for those of us bedside clinicians, clearly this happened when we first tried to our hands on ultrasound machines, where we thought we could do everything. And then we enter a realm where we really realized that, ha, we actually don't know that much. As our competence increases, our confidence actually decreases in what we can do. As to finally acquire a level of stability, as we become competent, would increase in confidence in what we do. Well, this kind of cycle or Dunning-Kruger effect is certainly impacting artificial intelligence as well. So as you can see on this slide here, provided by the Gartner Institute, this is the hype cycle for artificial intelligence, the same shape curve that you see. And on that curve are different techniques that actually all relate to AI. So I wanna point out a few things here. And first of all, is the sheer number of techniques. I know you can't read it all, but that actually are grouping around the theme AI. And what you can actually see, for example, is that there are well-known techniques such as self-driving vehicles, which is in the bottom of the Valley of Despair here, as we're now realizing that this is not gonna be as easy, but we actually do realize what the challenges are. And I think we'll slowly going to get there over the next decade or so. However, techniques such as, and I want to point you to the bottom right arrow, which is physically informed model of AI, which is how do you leverage all this nice knowledge that we have, the laws of physics or the laws of chemistry into AI. We know there is potential there. And of course, we're gonna go hit the peak of a hype and say we can solve everything we care about that before finally settling down into proper use or properly identifying niche for AI. So the same kind of cycle is also being provided to us by Gartner for healthcare. And I do wanna point out some things that you've heard that relate, for example, on the bottom right of the, the bottom left of the right side of the graph on autonomous monitoring, which is essentially can be done either remotely or at the bedside that actually provides you with recipes of what to do if certain things are monitored. We clearly see the potential of how to extract information for pervasive monitoring. However, we're still beginning to do that. On the other hand, critical care alerting at the peak of the hype right now, we think we're really good at actually putting together models, but slowly trying to realize that there are limitations that we can do. And as we're trying to find out which kind of data we actually need to do this. And at least those experts believe that healthcare interoperability was actually on the slope of enlightenment. I beg to differ with that assessment at the present time. I think we're still in the valley of despair. That is my personal take on this. The traditional life cycle essentially goes as follows. You got big data, generate a predictive or diagnostic or prescriptive model, and then try to actually integrate it in the workflow. So EHR integration and do then silent deployment and prospective evaluation. Then you do a clinical integration and bedside deployment. Then evaluate usability and trustworthiness. We'll come back to those terms later in the talk. And finally come to the realization that your wonderful system is now being considered for being rolled out by the administration because it's actually not doing what it's supposed to be doing. And then you bring humans in the loop and say, well, how should we actually modify this to create the next generation of our model? And this is essentially what we are seeing happening these days, which is, and of course, the commercialization, in my opinion, comes a little bit too early in the process here where we're, and this actually contributes to the hype, but also contributes to the exhaustion of those systems. We really think differently about the AI lifecycle in terms of well-prescribed periods of time, careful considerations for the types of models, how we actually articulate models together, how humans should be book in the loop really early in the process. How do we actually ensure that our models, as they're being created, are fair and secure? And where those elements are integrated as part of the performance metric. We are considering how much data do we actually need to extract value? What, how best to extract data from, knowledge from small data, for example, because not all data sets are big, unfortunately. And how do we try to maximize sustainability from the get-go? So how close is AI to the bedside? And I just simply want to point out a fairly recent paper by our European colleagues right now that counted the number of publications that related to applications of AI. Well, fully 95% presented a model, sometimes externally validated, with a potential application. Less than 1% of papers actually described a bedside implementation of a model and usability assessment. So this is still very much nascent. We're still very much, we're still pretty far from commonly going to the bedside. And I'm sure Dr. Trebek will have a lot more to do about this in the future. So I will not spend a whole lot of time in terms of what are the obstacles to bedside deployment, because we'll hear more about this. But let's think that there is an apparent loop. How do you leverage real-time data and generate model predictions or diagnosis out of it? I mean, this is challenging. We know how to do this on retrospective data to a certain extent. Do it in real time is a challenge. And then once you've done that and be able to deploy your models, who do you tell, how do you tell, and how do you evaluate usability and trustworthiness themselves present a whole series of challenges? And as we proceed through the Q&A questions, I simply put out some questions here as to what could be relevant considerations that relate to those obstacles. And the one that I'd like to point out is that as you contemplate in rolling out systems, you really have to be in sync with your institution's priorities. Do you have go-to people in your C-suite or is there an AI planning or development group in your environment that you need to consult or will be the actuators of whatever ideas that you have? So this level of integration or knowledge is of paramount importance to even start thinking about the local deployment. So let's talk about trustworthiness a little bit here. So AI systems must be available when you need them, reliable in terms of procuring information that you can actually trust and secure. And these are the basic components. The National Institute of Standards has put out very recently a white paper that does not relate particularly to healthcare AI, but to AI in general, where the components of trust are explained and translated in terms of an equation, which you see here outlined in gray, which is user trust potential times perceived system trustworthiness. User's trust potential is what is a prior belief in any given user in the technology. So it is a function of familiarity, education, prior experience and so forth. And we're gonna approach this by, I think mostly improving education. Perceived system trustworthiness typically has two components. What is the user experience? How do you communicate the information and how good is the technology trustworthiness? So trustworthiness can actually be expressed along a number of domains from accuracy, reliability, resiliency, bias or lack of bias, security, explainability, and so forth. And depending on the type of system that you're building, you may value some of those components, some of those domains more than others. It's very important if you have a music selection app that it reflects your choices really well and your preferences really well, and you'll have all trust in the system if it actually does that for you. And you really don't care if this is actually a safe app or at least most of us don't care. However, if you do medical diagnosis, it may be that you value accountability. By accountability, essentially, is the ability to convey information that explain what the system is trying to do. And maybe you value this. I'm not gonna trust a system that doesn't tell me why it says me what to do. So therefore, given comparable characteristics, you may trust or not trust the system depending on its use eventually. You definitely want to think about how good is my forecasting system? So the intrinsic trustworthiness of the system, of course, and the canonical example is the epic sepsis sniffer as was rolled out in that particular system a couple of years ago now. It's amazing how quickly this all happens. And only a few months ago, this paper by Andrew Wong's group essentially evaluates the EPIC system across many hospitals and concludes that it's probably not ready for prime time. And I'm putting in gently the conclusions of that paper here. So I do want to say, though, that generalizability is a model, which means that wherever I deploy it and whenever I deploy it, I should get reliable results is a wonderful aspiration and it's desirable. However, ultimately it is of paramount importance that a systems performs well in your environment because it will be deployed in your environment. Therefore, I would like to guide you towards this very short paper by Joseph Futoma and Leo Selle in Lancet Digital Elf on the myth of generalizability that actually balances out what is good locally and what is desirable in terms of evaluating those systems across environments. Talking about learning across environments, a very topical topic, a really topical subject of research right now is how do you do this optimally? You actually pull data together, you exchange models and parameters. So clearly you will hear about the notion of federated learning. It is nascent and it goes around some of the security concerns, but clearly this is really on the edge of what is currently being done. And I will certainly keep your eyes open towards that. What you'll hear about a lot is how do you evaluate bias in AI? So these models can definitely create or generate or maybe may come across with bias. There are a set of tools that are now being put forth and should be applied whenever you actually design a model to go around bias. Trustworthiness, explainability, and I will talk about explainability and explainable AI, which factors contribute to the effect and prediction and do actually the model give a hinting as to what are the cause and effect. And I will simply getting right at the conclusion here, but I cannot tell you without this simple analogy that an explanation that you like and that appeals to the clinician may be like milk chocolate, like you understand it, but a deeper explanation, maybe not so appealing, is more like dark chocolate. It's closer to the truth, but maybe you need different models to actually create the truly explainable thing that clinicians will understand and models that will actually perform really, really well. So can I believe this forecast? Again, with respect to trustworthiness, I will direct you towards those papers here. This is clearly on the cusp of what is currently being done right now. And how important is it that a model be explainable? What is a sufficient explanations remain open questions. And with these thoughts being put forth towards you, I'm looking forward to the discussion section where we can actually explore some of these things in greater depth. Thank you very much. Thank you so much for your talk, Dr. Klamath. Now let's move to our next speaker, Dr. Matthew Chirpek. Dr. Chirpek is an associate professor of medicine in the Division of Pulmonary and Critical Care, an affiliate faculty in the Department of Biostatistics and Medical Informatics at the University of Wisconsin, Madison. His data science laboratory uses these machine learning methods, such as natural language processing and deep learning to identify patients at risk for clinical deterioration, sepsis, acute kidney injury and other syndromes of critical illness. His research has been supported by a KO8 from NHLBI, an ATS Foundation Recognition Award for Early Career Investigators, a Department of Defense Award and R01 grants from NIGMS, NADDK and NHLBI. Dr. Chirpek. Thank you for that kind introduction and thank you to the organizers for inviting me to speak today. These are my disclosures, including research funding, as well as the University of Chicago has a patent pending for machine learning algorithms for clinical deterioration, but I have no commercial interests related to this talk. So today my goal is to cover the broad various aspects of a clinical predictive modeling project and to provide insights based on our lab's experience in developing and implementing machine learning models. So I think the first thing that I think is important to think about is the fact that you really should be beginning with the end in mind. When you're developing the framework for your machine learning project, you should start asking questions like, why are you building the model? Who will be using the model and what will they be using it for? Where will they actually need the model during their clinical workflow and when will the model need to run? So for example, our lab focuses on the identification of early critical illness on the wards and we're often activating rapid response teams for these models. And so the way we answer these questions were that we were building the model to improve the early identification of critical illness. The model would be used by rapid response teams to help triage the high-risk patients they wanna see. It should be something that's viewable while they're walking from patient to patient and it should be a real-time model that runs 24 hours a day, seven days a week. And so with that in mind, we were able to begin and essentially work backward thinking that we actually wanted to develop an iPad app so they could see the patients on the list and know who they should be seeing next. And that actually made us make specific decisions around the modeling that we were doing when we were developing our actual algorithm. For example, wanting a parsimonious model because we knew they'd have to be viewable on an iPad app. So the second thing that I learned is that there are several methods to identify your patient population and they can impact how your model works. So for example, how might you identify a cohort of patients with sepsis? Now you can certainly use things like billing code data. There are other things like electronic signatures of infections, like whether there's orders for antibiotics or cultures. There's also manual chart review and these have different trade-offs. So for example, we looked in a study of over 5,000 patients across six hospitals. We compared manual chart review to various billing code and EHR definitions of sepsis. And what we found is, for example, if you look at the sensitivity of these different definitions, it varied widely. For example, with any antibiotic order all the way up to the REIT criteria, the sensitivity and specificity differed quite a bit. And so how you actually identify your patient population can have impact on how your model works and if you're identifying the patient population of interest. So next, how would you choose your predictor variables? So I think we all know now with the EHR data, there's so much information available to us, especially in critical care. For example, there's images, there's clinical notes, there's vital signs, there's medications, there's a lot of data that's available. And so I think a few important points that I've learned along the way, one is that parsimonious models are often perverted medical applications. And you can do that variable selection at the beginning before you even do the modeling or during the modeling process itself. For example, using things like a lasso or elastic net. Even if model interpretation isn't important, consider the cost of mapping all of those different EHR variables that you wanna use in your model into your real-time system. That can take quite a bit of time if you have hundreds of variables. In addition, the oft quoted 10 outcomes per variable rule doesn't apply to a complex machine learning models. These more complex models often need hundreds and sometimes thousands of outcomes per variable in order for the model to actually be stable. Finally, you wanna confirm what variables are actually available in real-time in the environment beforehand. You don't wanna end up with this really accurate model only to realize that some of the variables that are important to the model actually aren't available in the real-time system you wanna implement your model in. So the next thing I learned was thinking about the sample splitting for training and testing. And Dr. Clement touched on this a little bit. There are certainly many different ways to do this. I think one important point is that your presented accuracy results should really come from a completely untouched test sample. So you shouldn't use that test sample for anything including looking at correlations or variable selection. It really should be completely untouched. In addition, I think if you're really thinking about implementing your model in real-time, you really wanna think about this matching the splitting method to how you plan to implement your model. So if you actually wanna implement the model in the same site that you're developing the model, then a temporal validation where you develop the model earlier and then validate it in newer patients is actually matches exactly how you'd wanna use your model in practice. However, if you're more interested in generalizability and allowing your model to be implemented in new sites, then geographic validation at new sites would be more important. So the next tip is the fact that you don't need hundreds of different machine learning algorithms and knowledge of all these algorithms when you're developing your model. There are so many different machine learning models available, but it turns out that actually a few of them often rise to the top when you're looking at model accuracy. For example, in one algorithm comparison study looked at a number of different machine learning algorithms across many, many different datasets. And what they found was that the random forest algorithm which is an ensemble of simple decision trees achieved over 90% of the maximum accuracy in the vast majority of the datasets. And in fact, when we looked at clinical deterioration different algorithms are predicting which patients on the wards are going to have a cardiac arrest or die or go to the ICU. We also similarly found that models like random forests and gradient boosted machines, again, these ensembles of clinical decision trees had the highest discrimination when looking at AUC compared to these other algorithms. Now in many applications over the years, deep learning is becoming more and more popular. And these are essentially stacks of neural networks that can learn from, for example, pixels from an image and then shapes, and then finally cars or learning how to read x-rays or CAT scans or other images. And that's something that typically what you see is that the amount of data increases, the performance will go up with deep learning algorithms compared to other older algorithms. So if you have a very large dataset and a complex problem like computer vision, NLP, audio translation then deep learning can actually be the way to go. So in general, I would recommend that you focus on a few algorithms based on your study goals. I think logistic regression should be really the starting point and the baseline for most of the applications. And really try to understand the nuances of these models really well. For example, what hyperparameters they have, the impact of imbalanced or missing data and how they impact your model. And in many medical applications, the final model that you wanna use should actually be the most interpretive model that has acceptable accuracy. So next, when you're thinking about measuring success of your model, I think it's really important to remember that it's more than just the AUC. It's more than just the area under the receiver operating characteristic curve. And in fact, if you think about how decisions are made in medicine, it's really often typically related to one threshold. For example, V-dimer above a certain threshold, order a CAT scan. Your hemoglobin A1c or blood pressure above a certain threshold, you start medications. So there's often one or only a few thresholds that you're interested in when you're actually thinking about implementing your model and essentially suggesting actions. And so one of the graphs that I found very helpful is this number needed to evaluate. So the number needed to evaluate to detect one event is here on the Y-axis, the sensitivity of your tools on the X-axis. And you can see different models in blue and red here. And based on what number needed to evaluate you want, you can actually then help, better helps you pick the threshold and what model to actually use. Because importantly, you're really gonna think about what threshold you wanna use for clinical action and different models, even with the same AUC can have different accuracy results around those thresholds. So next, I do agree that in general, it is an open question in terms of what medical applications model interpretability is important. But for the work that I do, we're trying to trigger and identify high-risk patients. We have found that model interpretability is really important. Because the clinicians wanna understand why they're being called or called to the bedside of these patients. And so certainly there are things like global importance of your models, where you can see what are the most important variables that relates to the model itself, for example, coefficients, or there's other variable importance metrics. And these are global related to the model. Within the model, you can also look, for example, to see how risk and the Y-axis here varies with age along the X-axis. And unfortunately what we found is no matter how you model it turning 40 really is bad news. And then for individual predictions, you can also see why are you being called for an individual patient? For example, this is a individual patient with the X-axis on the left and the right figure is time. And then along the Y-axis here are the different variables, as you can see. And then on the right, you can see highlighted in red are the parts of the patient's trajectory that are actually most important for an individual prediction. And there are other methods like LIME and SHAP that are also becoming quite popular for patient level variable importance. But ultimately what you're gonna need to do is to go from your model to a graphical user interface. And I would highly recommend having a multidisciplinary team with human factor engineers, experts, because they think that, for example, this is one of the graphical user interfaces that was developed for one of our models. And you can see that you can look at the trajectory of the patient's values over time. You can see all the different variables that go in the model and highlighted in red and yellow are the variables that are driving the score. So you really can see a lot of information in one place. And ultimately, this is how your clinicians are often interacting with these models. So the next thing we learned is that model alerts really need to be tied to specific actions. And so, for example, what we've done is develop clinical pathways where clinicians and clinical experts can develop these pathways of what you do for your high-risk patients. And this could be things like reassessing your patient, doing sepsis screens, and also recommending specific actions like more frequent reassessments or ordering lactates. And what we found is that the most improvement in terms of patient outcomes comes from combining the model itself with the pathways. So, for example, when we implemented our score, our early warning score, which we call eCART, in four Midwestern hospitals, if you look at the sepsis mortality, the pre versus the partial implementation where you just showed them the score, the mortality went down slightly. But once you combine it with those workflows that I showed you on the previous slide, it was really the combination of both the score and the workflows that drove the actions that actually decreased mortality. We found a 35% decrease in relative risk of mortality in sepsis. And also overall on the wars, we found a 25% decrease in mortality as well. So next is, which is related to these different outcomes is what you wanna measure for your model implementation. And now certainly I think you should be measuring the process metrics that you think will be driving your outcomes. And you should talk to your clinical champions as well as your statisticians around what outcomes you should be measuring. But I think one of the nice things about machine learning applications is because they're often utilized and embedded in the electronic health record, you can actually measure utilization of the model itself. And so for example, this is how we monitor the utilization of our tool. And you can see the percentage of patients who they actually are going in these clinical workflows around on the Y-axis. And the data since the implementation period started along the X-axis. And you actually see that over time people are using the tool more and more. And this can be very helpful to look at because for example, let's say you find that your model isn't, the outcomes aren't changing the way you'd want. Well, it's a very different story if your utilization is really low. So they're actually not using your model versus if it's really high but maybe it's driving the wrong actions. So I think measuring utilization is really, really important during your implementation of your machine learning algorithms. So last, I just wanna touch briefly on choosing your study design. I think there are of course, many different study designs of the randomized control trial, step wedge designs where for example, you randomize by ward or you randomize by hospital and implement during different periods of time. So you have concurrent controls. There's interrupted time series designs and there's also designs like the regression discontinuity. So one of the things that we learned is that randomized control trials can be challenging to do with machine learning models. So one of the things is that, especially earlier on and I think this is still true in many settings that there's actually high costs to go from creating your model to creating the graphical interface, to creating the embeddings and implementation of the actual tool. And once you put in all that effort to actually get people excited and get all this to start running, often then people say, well, I don't wanna really randomize to everyone. Only half the patient's getting it. It makes sense we put all this effort, why don't we just start it for everyone? So I think sometimes, you can get that pushback a little bit. And similarly, the pushback around, well, this seems like common sense. I wanna identify patients with sepsis, this tool identifies those patients with sepsis. So isn't it just common sense? Do we really need to study this? So again, that's sometimes what you'll hear as well. In addition, there's always a contamination and the culture change around the actual training effect in terms of training people to better respond to clinical deterioration or whatever your intervention is of choice. And finally, there's always the question of should be randomizing by patient, by clinician, by ward or by hospital. So there's a lot of complexity involved. So sometimes people are using other alternative approaches. For example, in a study that we just had accepted looking at our E-card score implementation across four hospitals, we use an interrupted time series design. So you look at the pre-intervention period, we actually found that the mortality was going up for the highest risk patients. Immediately upon implementation, mortality went down. That's sort of your immediate effect of the tool. And then over time, the outcomes also continued to improve. So again, that suggests that not only do you get an immediate effect, but actually as people learn how to use your tool, you continue to have an improved impact of the tool as well. One assumption of these models, which is a very important now in the time of COVID is the fact that your patient population isn't supposed to change during that time period. And so it becomes more difficult to use this type of a design when, for example, you're surging with COVID early and you're not surging later, how does that, that may impact some of the assumptions of these types of models. And then lastly, I think it's important to know that there's a method called regression discontinuity. So if you have a specific threshold, if you look at a graph of your model's actual survival prediction versus the predicted survival on the X-axis, in the pink line, in the dotted line that goes up, if you have a perfectly calibrated model, that's what it should look like. And after intervention, what you should see is that actual survival will go up above the threshold at which you activate your score. So this is just another method you potentially could use, for example, if you don't have the opportunity to randomize or to have prior controls. So in conclusion, bench to bedside translation of machine learning models is complex and difficult and working backwards from the problem, multidisciplinary engagement and thoughtful experimental design are key. Combining machine learning models with the right interface for the right users, caring for the right patients at the right time can lead to improved outcomes. So to acknowledge all of the collaborators for this work and thank you all for your attention today. ♪ Dr. Sharapak, thanks so much for your very insightful talk. I'd like to open up this time for some discussion between our two panelists and to open this talk why don't we open the session with some time by each of our panelists regarding what the top two or three takeaways we might want to highlight from our talk would be. So Dr. Klein, would you be so kind as to start off, please? Yes, the takeaways is that there is a clear indication that the emergence of data science and critical care is going to be a multidisciplinary undertaking and that the role of clinicians is very important. I think this is probably point number one. Point number two is that this is still very much a field in emergence and therefore there's a high potential of some of it to fail and not fulfill expectations. And what I would like to urge people is to be patient and realizing that this is actually a generational undertaking. It will take a generation before we are fully confident in the ability of AI to help us as clinicians. The third aspect of this is that there should be a very strong push currently to boost trustworthiness in those systems. And we went through the multiple components of trustworthiness, but it includes how we talk to bedside clinicians and the multiple stakeholders which includes patients and their families. So these would be my sort of three takeaways of where we stand. Thank you so much, Dr. Clement. Dr. Tripek, if you'd be so kind to give your top two or three takeaways. Yeah, so I certainly agree with Dr. Clement. I think that first and foremost, the translation of machine learning models to clinical practice is a very, very challenging and difficult endeavor. In machine learning projects, we often talk about how it's 80% data cleaning and pre-processing and maybe 20% modeling. I think you could sort of shrink that piece to being the sort of maybe 10 to 15% of effort. And the rest of the sort of 85% is actually the how do you implement this in practice and how you actually change clinician behavior. So I think that's part of the reason why we've seen a lot of modeling, a lot of papers on models, but very few papers on actual implementations. For me, one of the things that I learned that has been the most helpful is the fact that you really should try to work backward from a problem. So starting talking to your clinicians, figuring out what is the actual problem you're trying to solve and understanding their workflow. And then once you have that understanding, then you can start looking to see, could a model be helpful? Is it better than sort of your usual care? And then developing that model and ultimately the system. And I think if you do that and you bring in this multidisciplinary team, I really do think that we have the potential to improve patient outcomes using data science. Thank you so much. And I think you both really allude to this. And so why don't we actually dig a little deeper into that? So honestly today, right? I mean, the promise of AI hasn't really affected, you know, care in a meaningful way. And Dr. Clermont, you also suggest, you know, this may be the next generation of doctors that may be seeing this or even healthcare in general. So if we look forward 10, 15 years from now, where do we both see, where do you both see AI most being used? So I think that Eric Topol, our colleague cardiologist, say that doctors with patterns will actually reap the fruit of AI's earlier than others. And therefore radiologists, pathologists, ophthalmologists that have to interpret images to predict and diagnose will benefit from more mature technology. And therefore I think this is where we're gonna see significant improvements. There've been landmark papers so far as to how AI can augment clinical acumen for reading mammograms, for example. And you know which paper I'm referring to. So I believe this is not a low-hanging fruit, but one of the lowest-hanging fruits in AI. The kinds of stuff that Dr. Tripek and I are trying to do in terms of helping, reacting or preventing bedside crises through the analysis of complex patterns. I think we're seeing early successes, encouraging successes, but there is still a lot of tinkering to be done in order for patients to be mature. And if I wanna project a little bit more in the future, what about telehealth and how we monitor these patients at a distance and try to prevent crises before they happen? And I could go a lot longer about this and maybe some of the future questions will allow me to circle back to this. And I certainly agree that it feels like radiology, for example, where there's already have been publications of computer vision systems that have human level accuracy. I think you can certainly see this AI physician partnership when it comes to interpreting X-rays and other imaging, for example. I think another area that we could see, I think more rapid implementation would be areas where there are already tools available. For example, you can imagine in rapid response area where I work, tools like the modified early warning score are used across the country or tools to identify patients who should get a CT scan for PE, for example. Areas where there are already established tools. I do think that if you can develop more accurate tools that can replace something that is already in a clinician's workflow, I think that probably will have a higher rate of adaptation or implementation over the years, as opposed to something that is completely new. For example, where you're trying to get clinicians to treat different patients differently based on different characteristics where you don't already have a tool to help you with that. So I think if you're replacing existing tools, I think that's probably gonna be maybe the next phase beyond the computer vision and other applications. And then I think after that, there'll be sort of the new tools that have nothing to do with what we currently have that are probably gonna be the hardest to implement, I think, globally at scale across the country and ultimately hopefully around the world. And that's really helpful to hear, right? And so you actually also broached this way into our next set of questions, right? So then if we're seeing this opportunity, we're seeing these next steps, let's say that this is a relatively large conference with a wide range of data science expertise here. So if you wanted to, again, get started in optimizing some simple, implementable, inexpensive steps, for example, or even for those who might not yet understand data science or databases, how can we widen the field of the group of people who would be interested in pursuing these tasks? Thank you. Yeah, so I think that there are a lot of different ways to get involved. I think that in hospitals, hospital systems, universities, there certainly are these groups of data scientists and governance bodies around clinical algorithms that are implemented in the hospital are becoming more and more common. So I think that that is certainly one way to get more involved in these things at the institutional level. I also think that if you are a clinician and you are seeing in your daily practice that there are gaps in care that could potentially be helped by an algorithm, I think, again, as Dr. Clement mentioned, having this multidisciplinary team where you potentially reach out to other people at your hospital or your university who you know who are interested in data scientists to see data science. So you can essentially help create this group of people with a common goal of something to improve. I think that that can potentially go a long way into some sort of going from the ground up in your health system to try to make improvements using data science. I think there are also many resources that SECM and other organizations have that can be very useful as well in terms of data science, learning more about data science and critical care. So I think there's a lot of ways to get involved in a lot of opportunities and there'll be many more in the future. So I agree with Dr. Chirpik. Only want to emphasize possibly the paramount importance of identifying local champions in your own environment. And these are part of the C-suite. They're part of the IT team, the EHR team, and clearly your fellow clinicians. And who else is working on this with you that actually may know more than you do and may have more experience than you do in your own environment, because ultimately you do want to develop solutions for the benefit of your own patients. So looking around you, who does this in your environment is very, very important. That's very, very true. And thank you both for bringing in the content to really bring in both stakeholders and make sure that all involved parties are on board. It's more than just always just a clinician, for example. And so as we navigate, having all these models, sometimes that there have been whispers regarding having physiologic data, whether it comes from bedside monitors in the ICU through even wearables out in the field. So where do you think that having additional data, right? So as these physician champions go ahead and say, this model, maybe more data is always better. How do we approach this? But is that true? Do we think physiologic data will actually change or how do you think it might change how we implement the data science and critical care today? All right, I'll feel this one first, so. This is a multi-layered problem. I think there is publications and indications out there that increased granularity of data may benefit some types of patients and some types of patients may benefit some types of patients from some types of problems, such as the emergence of risk for imminent crises, such as hypotension or cardiorespiratory instability. You might see early indications of this in physiologic data. There's some commercial solutions that are based on this. I also see in the future that it might be the only way to non-invasively monitor patients at a distance. So as you wear your Fitbit, other wearable, you might only, or patch, you might only benefit from an EKG or plat signal to actually derive useful predictive information. So I think it's with us to stay. I have to concede, though, that the number of clinical trials or clinical evaluations of the incremental benefit of high-frequency monitored data is, there aren't that many of them available just yet. I think the evidence is building, but there are not that many things available. I also want to mention the following thing, is that many of us are thinking about the prescriptive use of AI. So, which means that what should I do given the information that I have right now? And therefore, this is not something that is easy to do, right? If you think about the AI clinician paper by Komorowskis and several follow-up papers on this, they're all data-driven models on rather coarse data. And that has been one of the problems associated with such approaches. A physiologist at the bedside will say, oh, well, I know to get fluid that it will only work if you're fluid responsive. And typically, vasopressors will include the peripheral resistance. How can you get that from EHR data? Maybe if you actually had high-density physiologic data, you could develop predictive models that you can't develop if you don't have that. So, this is not a topic for today, meaning that we need another hour to discuss this, but there might be niches where physiologic data might be of particular value, especially with respect to predictive modeling, to prescriptive modeling, sorry. Right, I mean, I certainly think that it is, at least at this point with the current technologies, it is very context-dependent. It's dependent on what the patient population is of interest and what the prevalence of the outcome is of interest. Certainly, when you have continuous monitoring, if your patient population has very, very low risk of the outcome, then you may end up with just a large number of false positives. I think that could be one of the challenges, whereas I think at least some of the earlier studies suggested these tools could be helpful, for example, in a step-down population for cardiorespiratory instability, as opposed to all comers in the hospital, some of which who could be not very sick. So, for example, we actually, we did a study a couple of years ago where we compared a continuous physiologic monitoring device to the data that was in the EHR, and we actually found the data in the EHR was actually more accurate for prediction. I think part of it is what is in the EHR, which is sometimes, for example, if you look at respiratory rate, which seems to be always 20 all the time for almost all your patients, it's likely a combination of potentially the clinician's worry about the patient, and should I actually sit there and count? And if I count and I get an odd number, then you know they're counting, and maybe you actually should be worried because they're worried enough to count and not just put 18 or 20. And so I think that there are certainly a lot of nuances in terms of where this will be valuable, but I do think that there are certain situations where real-time physiologic data could be enormously valuable. And I think we'll just see that more and more in the future in terms of who and in what period in their stay might this be most helpful. That's an incredible insight. And I know we're running short on time, but maybe perhaps one or two last questions. One question that I think you both have highlighted, right, is we don't actually know from, like one of the issues, right, that you've said is we don't actually know what's going on through the minds of staff at the point when people are decompensating or at the point when throughout the entire stay. So how, like besides relying on EHR and diagnosis codes and, you know, how do you even tell or get insight into what someone thinks from the data that you see? And I think you're hinting at this. Could you be a little more explicit? Right, I mean, I think that there are, you know, there are certainly some ways you can get insight. Certainly, I think that based on the patterns of orders or other actions that are done that are collected in the HR are ways that you can get a better understanding of their insight. You know, one of the things that we've done is to actually collect those data. So to collect insights on, for example, clinicians thinking around the likelihood that a patient's gonna deteriorate in the next 24 hours and sort of have that on a Likert scale and then see how that Likert scale score compares to what the model's saying. And then finally add the two together and see if there's actually complementary improvements in their prognostic accuracy. And if there is, then you can actually certainly develop these systems where, for example, if you have a workflow of actions you'd want them to do, you may consider actually adding that Likert scale into the actions and actually supplement the model. So you are actually then improving the sort of the sensitivity and false positive nature of their model by incorporating position intuition into the algorithm itself. Yes, I think that as part of the usability assessment of the system, you need to conduct focus group. And this doesn't give you necessarily what a bedside clinician thought of any particular case at any particular time, but it would certainly provide some insight as to the nature or quality of interaction between the system and general trustworthiness in the system and also areas of friction as to why I don't trust this system. And something that clearly comes to mind right now is are those glucose protocols where if you actually ask the nurses, are you following the protocol? They're saying, well, not when it gets slow, I don't, because I know that patient will go hypo if I actually follow the scale as it tells me to do. So this is useful information, right, for the designers. So I think that part of post-hoc evaluation is important. I would like to also emphasize that I agree very much with the analysis of orders. Orders translate patient's thoughts and, not patient's thoughts, clinician's thoughts, and therefore an intent. So I think a careful look at what was done by the bedside clinician, when your system was saying, do this, or here's the risk score, especially in silent deployment, ahead of clinical deployment, might be a very useful tool to decide of thresholds, decide of many things as to how to operationalize your bedside system. And I think certainly the field of natural language processing is evolving quite a bit. And there's been a number of papers now in NLP and in the medical area. So I think that certainly clinician's notes, as long as it's not just copy and pasted forward over and over without much additional changes, if you have notes with reasonable quality, you can also gain at least some insight into what they're thinking. And I think that's another technology I think we're gonna see more and more of in the future in terms of AI applications in medicine. Definitely. One last parting question, I think, as we finish out this session is, thank you both again for your time. But I also realized that all three of us are from large academic medical centers. And so one last question is, how do you envision data science to be more accessible to non-academic medical centers? Right, as in, I believe a large part of our audience will not necessarily be at an academic center. So how do we reduce disparities in knowledge on this? And what do we do in terms of being thoughtful with the role on the overall healthcare landscape? This is, thank you for this question, Dr. Wong. This is such an important consideration because clearly non-academic centers and families and patients that actually attend those facilities probably constitute the majority of the people that we wanna impact in 10 to 15 years from now. So I do believe that it behooves us as movers in the field to promote an agenda that allows the development of tools that are applicable in resource-limited environment or at least not resource-rich environment. And therefore it's part of, you develop tools but there's gonna be gold tools, silver tools and bronze tools. And it behooves us to actually evaluate the impact of each and the performance of each as well as the fairness of each as you develop such systems. And I guess I would just add that I think that the exciting thing that I think we're seeing over the last few years is that major societies like SCCM are getting more and more, I think involved and excited about developing data science portfolios through different seminars and educational opportunities. And I think those types of things. So I think developing the next generation of data science aware and data science educated clinicians in critical care and medicine in general, I think in part could be done through these society efforts to better educate all of us about the opportunities and latest and greatest in data science. Because I think in order to be able to think about even implementing these tools and having a broad understanding of what's even available, what's even out there, how can they be useful? How do I make the right connections to figure out how I can potentially implement these in my own hospital? I think all of these things and also developing collaborations and mentorship opportunities, I think all those things can potentially happen in part, at least through these societies. So I think that's really an exciting wave in the future we're gonna see much more of. Great. Dr. Klamer and Dr. Sherpak, thank you both so much for your time and for sharing your wisdom on this presentation today. So with no further ado, perhaps we'll finish out the session. Thank you so much. Thank you.
Video Summary
Dr. Claremont and Dr. Chirpik delivered presentations on the use of data science and AI in critical care. Dr. Claremont emphasized the importance of multidisciplinary collaboration and the need to increase the trustworthiness of AI systems. He also highlighted the hype cycle of AI and mentioned the challenges of implementing AI in healthcare. Dr. Chirpik discussed the translation of machine learning models into clinical practice and shared insights on various aspects of a predictive modeling project. He stressed the need to work backward from the problem, involve a multidisciplinary team, and focus on implementation. Both speakers discussed the potential future applications of AI in healthcare, including the use of AI in radiology and the monitoring of patients at a distance using physiologic data. They also provided suggestions on how to widen the field of people involved in data science, including engaging with clinicians and forming collaborations. The speakers also highlighted the importance of involving local champions and conducting focus groups to gain insights from clinicians and improve the usability of AI systems. They concluded by encouraging the audience to embrace the challenges and opportunities of data science in critical care and to work toward improving patient outcomes.
Asset Subtitle
Professional Development and Education, 2022
Asset Caption
Learning Objectives: -Define data science -Describe the life cycle of artificial intelligence -Discuss potential reliability and faults of artificial intelligence
Meta Tag
Content Type
Presentation
Knowledge Area
Professional Development and Education
Knowledge Level
Foundational
Knowledge Level
Intermediate
Knowledge Level
Advanced
Membership Level
Select
Tag
Innovation
Year
2022
Keywords
data science
AI
critical care
multidisciplinary collaboration
trustworthiness of AI systems
machine learning models
clinical practice
remote patient monitoring
patient outcomes
Society of Critical Care Medicine
500 Midway Drive
Mount Prospect,
IL 60056 USA
Phone: +1 847 827-6888
Fax: +1 847 439-7226
Email:
support@sccm.org
Contact Us
About SCCM
Newsroom
Advertising & Sponsorship
DONATE
MySCCM
LearnICU
Patients & Families
Surviving Sepsis Campaign
Critical Care Societies Collaborative
GET OUR NEWSLETTER
© Society of Critical Care Medicine. All rights reserved. |
Privacy Statement
|
Terms & Conditions
The Society of Critical Care Medicine, SCCM, and Critical Care Congress are registered trademarks of the Society of Critical Care Medicine.
×
Please select your language
1
English