false
Catalog
Deep Dive: An Introduction to AI in Critical Care ...
Introduction to AI and Machine Learning
Introduction to AI and Machine Learning
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
So I'm Ankit Sakooja. I am an intensivist at Mount Sinai. And I also co-lead the AI lab at Sinai. So I see it from both sides, both from the R&D perspective and from operational perspective, as we develop multiple AI models and try to integrate them into operations and patient care. Now in the next 45 minutes or so, I'll try to give you a very broad bird's eye view of what AI is about to have a foundation for the rest of the discussions that we will have over the next four hours. But again, if I'm speaking too fast or if there are questions, please feel free to stop me any time. All right, so the big question is, why do we even have this course? Why do we care about AI? And as Leo pointed out, we care about AI because AI is in basically everything that we do right now, right? From multiple appliances that we have in our houses, in our institutions, the cell phone that we carry in our pockets, many GPSs, self-driving cars, right? That all, a lot of that runs on AI. So it's very important for us to understand both about AI and how it's going to get integrated into our daily lives, both as consumers on a personal level and as consumers and generators of data on a professional level, right? Now this, you can see over here on this graph how the publications in AI over the last 10-ish years have exploded. And this shows various types of AI algorithms and publications through that. But this gives you an idea that how much interest there is in utilization of artificial intelligence at this point, right? So the basic question is, what is artificial intelligence, right? And I asked this question to my seven-year-old. And believe it or not, he knows more about Chachi Petit than I do. But he was like, you know, Dadaad, it's the science of making intelligent machines, right? So it's a very simple answer. But then that makes you think, what is intelligence, right? Because if we understand intelligence, we can understand how to create these intelligent machines. Now, intelligence has been defined in various ways. And these are all different characteristics of intelligence. But if you really think about it, intelligence, at the core of it, is being able to learn from experience, right? We all have different experiences when we are trying to learn something new or even doing something which is mundane or operational for us on our day-to-day lives. We learn from that experience. And then we can apply it in a new situation. That's intelligence, right? So you can say that artificial intelligence is just that, when we can create machines that can do that, right, that can learn from their experience and can, by themselves, apply that experience to new situations to get to a desired outcome, right? So in a way, they can do these complex tasks that we do autonomously to a point, right? Now, having said that, let's talk a little bit about different kinds of AI. And in the world that we are in now with how fast-paced AI is growing, there is probably a new type of AI every month, right? But I like to simplify things because it just keeps me, it makes it easy for me to process things. And when we do that in AI, we can put most of the AI algorithms that are out there into three big buckets, right? The first one is machine learning. Then there is this term deep learning, which I'm sure a lot of you have already heard. And then is this big bucket of natural language processing. And then there is a lot of cross in between to create a lot of AI algorithms, right? So let's parse these through a little bit together. So machine learning. Again, in machine learning, all we are trying to do is we are trying to allow the computers to learn from their own experience, right? Now, we still have to do programming. I run an AI lab. We do programming day in and day out in there, right? But it's about how much programming we have to do and how much the machines can figure out themselves. Now, if we delve deeper into it, there are three different kinds of machine learning algorithms that are out there. We have what we call a supervised learning, unsupervised, and then reinforcement learning. So let's talk a little bit about all these algorithms and various names that are out there. So supervised is the most, I would say, most reflective of what we would do with various statistical techniques, right? Say, for example, regression models. They would fall under supervised learning. So what exactly that is? Think of it this way. You have a basket of fruits, right? There are some apples, some oranges, some bananas in there. And you want to write a program or you want to create a robot that can segregate these fruits out accurately, right? So one of the ways to do this is you tell the machine that, you know, this is what the apple looks like. This is what the orange looks like. This is what the banana looks like. Now, if this were a data set, what the information you'll be providing the machine will be, maybe the shape of the fruits, the size of the fruits, their weights, their color, their texture, their taste, yada, yada, yada. They'll be all your variables or features in there, right? And then you have provided these labels that, you know, this is what the apple is. This is what the orange is. This is what the banana is. Now, can you create a model which can accurately predict what fruit is what, right? And the machine will do something like this and start to come up with a solution. And maybe this is the solution. And then we're like, well, this looks OK. But, you know, it can probably be better, right? Because there are some misclassifications in here. So now what the machine will do is it will train again and try to correct these errors, right, where it misclassified an orange as an apple, for example, over here, right, or an apple as a banana. So it's trying to see that what it predicted, how close or how far it is from the ground truth. And we'll take this experience in the next iteration of training and try to make its predictions closer and tighter to what the real ground truth is, right? So it will run the model again. And maybe, you know, it'll come up with an answer like this. And you'll keep doing it multiple times. It is not unusual to have these iterations trained for hundreds, sometimes thousands of iterations. This depends on how complex the model is, how complex the data is, right, until you get to a point where the result makes sense. Not rarely is this clean, right? There's always some error. Nothing is 100%, as we will see in a few minutes here. But this gives you a broad idea of how these supervised learning algorithms work. Sure. So, I'll talk about this in a little bit, but- Repeat the question. Request them to use the microphone. Our question is this, but just repeat. Okay. So, you know, what the question is that these corrections that are being made, are these being made automatically by the machine, or are they being prompted by the human every time? So, I'll answer this question in a little bit in the talk, but it is a little bit of a mix of both, right? All right. So, coming back, the magic of supervised learning, though, is not that it was able to classify this basket of fruit that we had now, but that if you give it a new basket of fruits with, you know, a very different number of fruits, it can still do that with good performance, right? So, that's the magic of this machine learning, that it can take the experience that it has generated, that it has garnered with training, and then apply it in new settings. And as I said, one of the very simple machine learning algorithms is regression, that, you know, most of us have learned since med school. That is exactly, you know, how we can use this. Now, another machine learning algorithm, which I think is very, very important, and if you pick up 10 different papers, you know, you're bound to run into those, are something called decision trees. So, the best way to understand it is I like to play golf, right? And to play golf, I have to look at two things. One, that my wife is in agreement that I can go and play. That's I guess the more important thing. But then I have to look at weather, right? Now, if the weather, as it turns out, in New York City is cloudy, it's a good time for me to go out and play. But if it turns out to be sunny, and it can get hot and humid, right? So I need to see how much humid it is. Do I really want to go out and play golf, right? Or if it is raining, then how windy does it get? So these are all the things that can dictate my decision path, right? So at every node of this flowchart or this decision tree, it tells me, it can predict whether it is good for me to go out and play or not. But now say I want to have a round of golf while I'm here in Orlando, right? So one of the things I can do is I can build another decision tree while I'm playing in Orlando. Or if I go to LA, I can build another tree over there. And what if I want to play in Chicago? I can build another tree over there, right? But this way, I just have to keep building different trees and keep looking at different trees for every city that I'm going in to go and play. A way around this might be that, you know, we can take all these trees together and aggregate the results out of them, right? That gives a generalized flowchart for me to be able to make a decision about something that I'm trying to predict. Here being able to play golf, but it could be about something else, right? Another way to do this might be that, you know, I take the tree that I built in New York, and then I refine the tree to be able to accurately predict if I can play in Orlando. Or as a next step in Chicago, or as a next step in LA, right? So I can continue to refine this tree based on how this tree was performing in the previous city or in a previous data set, right? So these are two very powerful machine learning techniques, which are very, very commonly used. The one on the left side is called a random forest. It's a random forest of various trees, right? The one on the right side is gradient boosting. Again, two very common, very powerful machine learning techniques that if you read machine learning papers, you're bound to see them. But at a very, very simplified and a basic level, this is what is happening under the hood. Now, an example of this is this must plus classifier, which was developed at Sinai. What this, this is a random forest classifier. So again, you know, it's a random forest of trees, right? And what it is used for is to predict patients in the hospital that are malnourished. And that sends out an alert to the registered dieticians for them to be able to come and evaluate that patient and give recommendations for that patient, right? And as you can see over here, the model performs better than the non-machine learning classifier, the must classifier. All right. So moving on, the next type of machine learning is something called unsupervised learning. So we have, we go back to that same basket of fruits. And now I'm like, you know, I'm not feeling it today. I'm not going to tell the computer what's what. Let it surprise us and segregate out different kinds of fruits. And you'd be surprised how good the machine can be. And again, this is an extreme example. Nothing is a hundred percent, but it can still segregate out or, you know, develop these clusters of different similar characteristic observations, right? If this is say patients with sepsis, ARDS, AKI, it can identify those, right? It can identify homogenous groups of patients that have, you know, some homogenous characteristics among a wide variety of disease states, right? Now here it won't be able to, in this example, tell you that, you know, this is an apple, this is an orange, this is a banana, because those are made up terms, right? We didn't tell the machine what's what. So it'll just tell us that, you know, these are three different groups that are there and we can call those groups whatever we want, right? And as I was saying, these techniques have been used for various disease states, right? And not only that, we can use these same machine learning techniques to even identify various temporal patterns, which are very important in the setting that we practice in in ICUs, right? Everything that we do, all the patients that we interact with, all of the data, it's all time series data. But we can use these same machine learning techniques to parse out temporal data. For example, over here, we took patients with sepsis who developed AKI, and we looked at their creatinine trajectories and we found with machine learning that there were eight different creatinine trajectories that these patients had. And not only that, their outcomes were different in terms of persistence of acute kidney injury or mortality over time, right? So these can be very, very powerful techniques. Now, one of my favorite machine learning techniques is reinforcement learning. This is very different from anything that we have talked about so far. Now, I think the best analogy for this is that, let's say this kid wants to walk from this table to this chair, right? And the kid's parents are like, you know, enough crawling, you got to walk. So what will she do? She'll take one step, right? And if she doesn't fall, the parents are going to clap, right? That gives the kid positive reinforcement that she did something good. She's going to learn from that and continue to do that. But what if she took a misstep? She's going to fall, right? Boo-boos. Negative reinforcement. Kids learn fast, right? So she's going to know now that, you know, if she puts her left step forward, next time she has to put the right step forward. She cannot put the left step forward again. She cannot wiggle her hands in a way that will make her fall. She has to make sure that her head is well positioned, right? So all these things become important. And based on this positive and negative reinforcement, she learns to walk. But if you think about it, this is exactly how we learn new things when we are learning anything new. So say, for example, if you're taking a test, if you do good, you're like, okay, next time when I take a test, I'm going to do exactly what I did. If you don't do so well, then you re-evaluate. That maybe, you know, did I not use the right material? Did I not understand the material? Did I not give it enough time? And you reassess and you learn from your mistakes. So very similarly here, this kid is learning how to walk by interacting with the floor and the environment, right? Now in this reinforcement learning, again, this is something that is coming up more and more and you are bound to get into some of these papers. So some of the terminology that I just want to introduce here, this kid in this example is the agent, right? The kid is the one that is taking these actions. The environment in this case is the floor, the table, the chair. But if this was a patient data set, is the clinical characteristics of patients. The state is the current characteristics of, you know, whoever you are studying. So in this case, it's going to be the position of the legs of the kid, position of the arms of the kid, the head position. If it's a patient, then it'll be, you know, what are the current characteristics of the patient, right? That will become your state of the patient. The actions are, you know, the kid is taking an action. The kid can go take a step forward, take a step back, left, right, right? And the rewards, so what we do with these reinforcement learning models is we try to train these models so that the model wants to get more and more positive rewards. The model becomes more and more reward hungry, right? Just like that kid, she wants positive rewards, not negative rewards, right? Now this is an example of a reinforcement learning algorithm that we train, where you can see on the left side of the screen, the total rewards that the model was getting were higher than the rewards that the clinicians got. This is all in a retrospective data set. But what is very interesting on the right top half of the screen that you see is that when the model says that give more medication, here what we specifically tested, our studies was insulin dosing in patients, in critically ill patients. So when the model said give more insulin, those patients actually had much higher glucose levels and their time in range with 70 to 180, so, you know, that within range for glucose levels was much lower, right? So model is trying to identify these different patterns and making recommendations for personalized clinical actions. All right. Moving on to deep learning. So I'm sure all of you or most of you have heard this term deep learning. What exactly is that? So deep learning is based on the concept of, you know, neuron. And as in a neuron, it takes inputs, processes them, and comes up with an output. Deep learning does exactly that. We have an artificial neuron, which takes inputs and comes up with outputs, right? Now this artificial neuron is sometimes called as a perceptron, so you might read some papers which are talking about perceptrons, but that's exactly what that is. Now if you look at it, you know, more closely, this artificial neuron or this perceptron, it really is just a mathematical function, right? Now let's go back to our basket of fruits. So this iteration, in this iteration of training, we have various features, right? We have shape of the fruits, we have color of the fruits, we have size of the fruits. Let's say, you know, we take these three features and we are training this artificial neuron now to recognize what fruit is what. So the way the neuron will recognize it, it is going to give importance randomly, random importances to all these features, right? And it's going to train itself to see what the results come, just like in that basket of fruits. So for example, over here, it's giving highest importance to the second feature, lowest importance to the third feature. Then in this iteration of training, what it does is, it just sums up each feature and its importance level, right? So it takes this weighted sum, and this sum is what dictates whether this neuron should fire or not, just like a biological neuron, right? Whether the neuron should fire or not, we can dictate that by encoding that information in this f of x, the function over there, something called as an activation function, right? Not only that, once it is decided that a neuron should fire, we can dictate what sort of output we want, right? We might just want to know, say, if we are predicting mortality, we might just want to know whether there is going to be mortality or survival, so zero or one, right? Or we might want to know a probability distribution. We might want to get some actual values there, right? So depending on what our output is, we can encode that to a large extent in this activation function and dictate what sort of output do we want from this artificial neuron, right? Now in real life, one neuron is seldom enough, right? And that's the case in AI. We need more neurons, right? So in reality, when we are building these deep learning models, there are layers of these neurons. And typically, the first layer is just taking the input variables. It is passing them on to this, what we call as a hidden layer, where all this complex calculations happen. And then the result is passed on to the output layer. And depending on how we want to see the result, those outputs come out, right? Now when we have two or more hidden layers, that is something called as deep neural networks. So when we're talking about deep learning, we have a lot more of these hidden layers where all the processing is happening. But again, in critical care, let's say somebody is hypotensive, right? Can I treat a patient who's hypotensive just knowing that person is hypotensive? Sure. But it'd be better to know what was the blood pressure of this person just before that. What were the treatments given, and how did that patient respond, right? If I know all that, I'll be able to treat that person maybe even a little better than what I would in a vanilla flavor, right? So what that means is that this multilayered perceptron that we have may not be enough because we need to know what is happening at a previous time step, right? So that is where these recurrent neural networks come in, where these hidden layers, where all this magic is happening, all these complex calculations are happening, they're not just getting inputs from this input layer, but they're also getting inputs from themselves at the previous time step, right? So they are actually learning from what happened by their calculations in the previous time step. But this is, in a way, a very short-term memory, right? They are just, they just know what happened in the last time step. In reality, we need a longer-term memory because we want to know what happened a few time steps back to be able to make meaningful decisions. That is where these LSTMs, the long, short-term memory networks come, right? Now these can be very powerful. For example, this is a model we trained to identify patients who have major acute kidney events after acute kidney injury. And this is based on an LSTM. And what you see over here is that this model performs much better than Cox proportional hazard models in being able to predict these events. And the only real difference is that this is based on an LSTM model and can incorporate time series data better than Cox proportional hazard models, right? Now in last few years, I mean, the Chad GPT has been enormously effective, right? And Chad GPT, DeepSeq, all these models are built on an architecture known as transformers, right? Now these transformer models, they have a lot longer-term memory built in. And not only that, they can understand the context of things, right? If you write something in Chad GPT, you can write a long paragraph over there. It can still understand what you were trying to say in the first line and how are those words making sense and conceptually related, right? So this transformer model can help with all of that because of the complex mechanisms that are embedded in these models. Now you can also use these same transformer models for various prediction tasks in ICU. For example, we developed a transformer model to identify which patients are being underfed in ICU. And as it turns out, the model's performance is very decent in doing these tasks, right? But again, these transformer models are, I would say, the backbone of a lot of the large language models that are out there and a lot of work that is happening. So if you read literature now, you are bound to hear about transformer models, but this is exactly what they're talking about. Now one other type of data that we frequently encounter within ICUs is imaging data, right? And the question is, how do we have an AI model read these images? And if you think about it, all these images, we can break them into pixels, right? These small, small squares. And for all the linear algebra buffs out here, then this just becomes a matrix, right? Which is very easy for machines to read. All the machines have to do is figure out information encoded in this matrix. We can do this by specialized machine learning programs or deep learning programs called as convolutional neural networks or CNNs, right? Now what is happening under the hood in there? It takes a magnifying glass. The CNN model takes a magnifying glass and starts to look for patterns within these pixels. It identifies and extracts these important patterns throughout the image that we are trying to run, right? Then it tries to pull that down even more. You know, aggregate down even more important architectures that are there. And then creates a representation of the image. But again, just like the basket of fruits that we had, this gets done multiple, multiple times. And ultimately the machine starts to learn whether that chest X-ray was normal or was it not normal. Or if you have image of cats and dogs, who are cats and who are dogs, right? An example of this in motion is this model that we developed in our lab where we took 12 lead ECGs. And all we did was we fed it into a CNN model, just like an image. And it was able to predict right ventricular rejection fraction calculated on cardiac MRIs very well. And it validated out very well. So these can be very, very powerful models, right? OK. Moving on to NLPs. So NLP, I don't think anybody is a stranger here to NLP in this day and age, where we all interact with various NLP programs and large language models. At the core of it, NLP is training the computer, training these AI models to understand and generate human language. As it turns out, if you go back a few years, NLP wasn't much advanced. If you gave it these three sentences, all it could do was just count the number of different words that are there. And we were very happy with that, that we can do that. Then it took a while, but it was able to understand the relationships between words. It was able to understand that tiger and golden are close in meaning in these three sentences that we have over here. And then, now we are at a point where it can actually understand the context that you are building into these sentences, right? It can understand that tiger is not an animal when we are talking about tiger here. It's actually the name of the dog. So that's pretty powerful if you think about it. Now, these NLP, various NLP applications that have been used for a while for text classifications, like the spam filters on your emails, they run on NLP applications. A lot of chatbots can run on these NLP applications. Over 2 thirds of all the health care information that is there is in text form, right? Something that has been not much looked into until now, because we didn't have tools to effectively mine that information. But at Sinai now, we are routinely extracting this information from the nodes to be able to utilize them for various R&D tasks, again, using NLP and large language models. And then sentiment analysis, again, what is the sentiment of something that is written, right? Using NLP and large language models, we can achieve that now, because a lot of these models can understand what's the context, what's the basic sentiment that is written behind the text that is there. Now, in this day and age, you have to know a little bit about large language models. I'm going to speak very, very briefly on that. Now, these are, in a way, a cross between NLPs and your deep learning, especially transformers, right? They are trained on vast amounts of data. And because they use these transformer architectures, which are very, very powerful architectures, they can understand. They can not only remember what was written, they can understand what was written, why it was written, and they can create context around it. And because they have been trained so broadly, they can be adapted for a wide variety of tasks. For example, this study over here showed that these large language models can be very effective in summarizing various portions of patients' charts, including parts of the hospital course, parts of the questions that the patients ask clinicians, part of imaging reports, right? So from an operational standpoint, one of the things which seems very interesting to use these large language models is for coding and billing, right? A lot of it is mundane task. We should be able to use a large language model to automate a lot of this coding. So we were very interested in it, and we looked at it. What we did is we evaluated various large language models and used some de-identified patient notes, which we fed through them. And then we had two certified coders who coded independently. And we wanted to see how good these large language models were in comparison to the certified coders. As it turns out, they weren't very good. As you can see over here on the right side of the screen, the concordance was horrible. The chat GPT, so GPT-4 was best, but still pretty bad. So why is that? So we went in and we dug into some of these patients' charts and the codings done by the LLMs and codings done by the certified coders to see why that was happening. And as it turns out, there were some usual issues of hallucinations and misdiagnosis and whatnot. But what was very interesting was two specific coding guidelines. One, where if, say, somebody's sodium is 134, 133, and a physician didn't call it out in their note, the model picks it up and codes for it. Which, based on coding guidelines, you can't. The other was the model very frequently would code for both the parent conditions. So for example, if somebody comes in with cough and fever and has a pneumonia, we can only code for pneumonia. But the model would code for cough, fever, and pneumonia, which, again, is against the coding guidelines. So this shows you that, again, the power of these large language models, but that we are not there yet, where they can be very seamlessly integrated into workflows. All right. Now, I want to. I have a question for you. Yes. I just want to make a comment. In the hyponemia, one interesting thing. Do you want to use the mic? The hyponemia example is very interesting, because coders were taught that they can't pick it up unless we wrote it. But the machine knows that the definition of hyponatremia is a sodium below reference range. And so I'm wondering if there's a lot of catch-up that has to come from regulation to where we need to be going. Because these rules were built when we operated on paper, and now we have all this technology that's evolving so quickly, and we're outstripping the regulations. So it's just an interesting concept to think about, because the model isn't wrong. It's just that regulations haven't caught up. Absolutely. That is 100% true. All right. So last 10 minutes or so, I want to talk very briefly about that when you're reading papers for these various AI algorithms, or you're evaluating these AI algorithms on their own, what are some of the nuts and bolts for that. So one of the first things is that when we are developing the model, we want to make sure that the model understands the structure of the data. There was a question earlier in the talk that how do we say that this model is learning the data just good enough, that the error term is decreasing. So the way we would do that is we want to make sure that the model learns the data enough that the error function is low enough that it can still generalize well. So say, for example, let's go back to that example of basket of fruits that we had. Now, that had a variable when we were talking about it. Let's say it has a variable of the weight of the fruit, or let's say the size of the fruit. And let's say the basket of fruit had oranges which were all between 5 and 1 1 to 6 centimeters in diameter. If it starts to learn that oranges are only 5 and 1 1 to 6 centimeters, when it goes to the next basket of fruits where oranges are, say, 8 centimeters, it will never pick up those oranges. So it's not going to generalize well. So for example, in that case, it will look something on the left side. So it's not just learning the structure of the data. It's starting to memorize the data. So if it starts to memorize the data, just like if there is a test, and I just cram through the coursework and go and vomit whatever I learned yesterday in that test, as long as those questions are exactly the same, it'll be phenomenal. But if the questions are testing concepts, I won't do very well. That's, in a way, exactly what that is over there. Now, how do we identify that? While we are building the models, we need something real time to identify this error, or this loss function, as we would call it. And the way we would monitor that is we would monitor these loss curves. So the blue one over here is the loss curve for the model that we are training. And what it shows over here is that initially, the loss was a little high. There was error. And as the model is training in more and more iterations, that loss comes down and stays down. And the same thing happens in the validation set in this orange line over there. If the model is overfit, we will see that the loss stays, or this error stays, very low when we are training. But the error is very high. This error rate is very high in a validation or external data set. That tells me, or that tells us, that this model is starting to memorize the data rather than just learning the structure of the data. And if the model hasn't even understood the data that it is being trained on, the loss will look very high. So these are the ways that we can look at it. And we can encode a lot of this information into the code that we are running. Now, another, I think, very important evaluation metric to understand is the classification threshold. So let's say we're building a logistic regression model. And let's say we are building a logistic regression model to identify if an email is a spam or not. Let's forget that we need NLP and things like that. But for argument's sake, we have a data set with various features of whatever is in the email. And we are trying to decide whether it's a spam or not. What the output we are really interested in is, is it spam or not spam? But that's not how the model thinks. What the model thinks is, what is the probability of it being a spam? And then we create these artificial filters where we say, if the probability is higher than this, then this is spam. If the probability is lower than this, then this is not spam. Now, in general, in a lot of these models, the default setting is about 50%. So the probability is 50% or more, it will label it as spam. If the probability is less than that, it will say it's not spam. So say we have a data set in which these are the probabilities of that email being spam. Using the default method, it will say that first three are spam, rest are not spam. But think about it, if you're deploying it in a setting, depending on what your outcome is, you may want to be very, very specific. Because if you're wrong too many times, then people might just stop looking at the alert you're trying to give them. So you might say that, I want this classification threshold at about 70%, and then what is really spam is just coming out. Or you might say, I want to cast a really broad net, because I don't want to lose anybody. So you will decrease this classification threshold, and you will catch more. But again, you might catch more of these outcomes which may not be real. And you can identify those by creating a two-by-two table like this, where all you're doing is you're trying to identify what the model said, what the model predicted, versus what was real. That gives you whether it is true positive, false positive, true negative, false negative. Simple, basic statistics 101, the same thing. Now, what this gives us an opportunity is to look at some other metrics. For example, how accurate is the model? Which is your true positive values and true negative values over all predictions? So if accuracy is 1, which it almost never is, then it's a perfect prediction. But this is a slippery slope. Say, for example, we have a data set of patients with sepsis. And in that data set, let's say mortality, for argument's sake, is only 5%. And most of our outcomes are kind of like that, between 5% and 30%. I mean, it rarely is that mortality or some outcome is seen in 50% of the patients, right? There is always the outcome is not that high. Now, in this case, let's say you build a model. You want to build a model to predict mortality. And the model that ended up getting built is the most optimistic model in the world. It never says anybody will die. It says you'll be fine. Only thing it says is that the patient will not die. So think about it. What's the accuracy of this model? 95%. The model is garbage, right? It does not predict anything with a 95% accuracy. So when you're looking at some of these metrics, you have to keep an open mind and think about some of these metrics. Now, that is where some of the other metrics come in, which can be very helpful. So one of the things is called precision or a positive predictive value, right? That of all the predictions that the model did, which were actually true? Or your recall or sensitivity, which is that of all the true labels, which ones did the model actually pick up? Or specificity along those lines? So let's look at the model, our most optimistic model that we built, which has a 95% accuracy but never predicts the outcome. The precision is undefined. Recall is zero. Specificity is 100%. So if you don't evaluate some of these other metrics, sometimes you can think that that model performs really good, but in reality, it might not, right? All right. One other metric which I think a lot of people have a love-hate relationship with is the area under the receiving operator curve, right? Now, what does this really do? All it does is, in this graph down here, it tells how well can the model identify these orange circles from these purple squares, right? That's all it does. So how do we create this? So on the left side of the screen over here, you see the distribution of predictors for two outcomes, right? And on the right side of the screen, this red dot that you see moving, what that is doing is it is mapping out the true positive value and the corresponding false positive value at each segment, right, as you can see when the vertical line moves on the left side of the screen. So this is how you create this area under the curve. The higher the area under the curve, the better, right? Again, when this graph starts over here, you can see that it wasn't really separating positive and negative outcomes at all. The area under the curve wasn't very good. It was at the reference line. But as the model is able to separate them out more, the area under the curve goes higher and higher, right? But there's a catch over here. Let's say we have a highly imbalanced data set, right, where the outcome is very, it's seen in very minimal number of patients, which is most of the data sets, right? That we have 5% mortality, or 5% AKI rate, or 5% ARDS rate, or something like that. The area under the curve, regardless if you know how imbalanced your data set goes, for that same model, doesn't really change just because there is imbalance. But as we saw, that the precision and recall can be very important to look at. So if you plot out the precision and recall, you can see how that curve changes the more imbalance you have in your data. So it can give you another set of NIs and another way to evaluate how good your model is, especially if your model has a lot of imbalanced data. All right. Now, very, very briefly, what do we do if we have imbalanced data, right, which is very, very common? One of the things we can do is we can just undersample the data from what we have in higher quantity, right? We usually don't do it because we don't like to throw away the data. The other thing is we can just oversample the data. We can do that. The problem that we could end up with this is we are using that same data again and again and again and again to get rid of imbalance. We will start to overfit our model, right? So sometimes what we will do is what you've been taught since med school never to do. We will create our own data. It's a little more complex than that, but we'll create synthetic data, which is very similar to what the other data is in the imbalanced data set. Now, this is my last slide over here, and I just wanted to introduce some of the available ICU data sets. In critical care, we are very lucky to have so many critical care data sets with a lot of granular data available, right? So one of the prime examples of that is the MIMIC data set that is Leo's baby. And it has data from Beth Israel, many, many years, very granular, very good quality data, right? So if you guys are thinking of dipping your toes in developing machine learning models, that may be a good data set to practice things on, hone skills on. Another data set is the EICU data set, which has been developed by Phillips and has data from over 200 hospitals across the US. The data is, I would say, it's a very, very good data set. But because there is so much heterogeneity in the hospitals, depending on what you're trying to study, you will have to see how good the data is. And then we have these three data sets that I have over here from different places in Europe. So SICDB is from Austria. HyRID is from Switzerland. And this Amsterdam data set from Amsterdam. And again, these are all very, very good data sets. Each data set has its some strengths, some limitations, depending on what data is available, where they are from. But they all provide very, very granular level data to work on. All right. Thank you.
Video Summary
Ankit Sakooja, an intensivist and AI lab co-leader at Mount Sinai, provides an overview of artificial intelligence and its integration into healthcare. He explains the importance of understanding AI, as it's increasingly pervasive in daily life and professional settings. Sakooja outlines three main AI categories: machine learning, deep learning, and natural language processing (NLP). <br /><br />Machine learning, with types such as supervised, unsupervised, and reinforcement learning, involves training computers to learn from experience. Supervised learning uses labeled data to train models, while unsupervised learning uncovers patterns in unlabeled data. Reinforcement learning focuses on decision-making through trial and error, optimizing for positive outcomes.<br /><br />Deep learning, rooted in artificial neural networks, processes inputs through complex computations to generate outputs. It can extend to recurrent and long short-term memory networks for tasks involving time-series data, and applies to tasks like interpreting medical images with convolutional neural networks (CNNs).<br /><br />NLP involves training AI to understand human language, with recent advancements enabling models to grasp context and sentiment. Large language models, built on transformer architectures, further enhance text processing and task adaptability.<br /><br />Sakooja emphasizes the importance of evaluating AI models using metrics like accuracy, precision, and recall. He also highlights available ICU datasets, such as MIMIC and EICU, for developing and honing AI models in medical settings.
Keywords
artificial intelligence
machine learning
deep learning
natural language processing
healthcare
AI models
ICU datasets
Ankit Sakooja
Society of Critical Care Medicine
500 Midway Drive
Mount Prospect,
IL 60056 USA
Phone: +1 847 827-6888
Fax: +1 847 439-7226
Email:
support@sccm.org
Contact Us
About SCCM
Newsroom
Advertising & Sponsorship
DONATE
MySCCM
LearnICU
Patients & Families
Surviving Sepsis Campaign
Critical Care Societies Collaborative
GET OUR NEWSLETTER
© Society of Critical Care Medicine. All rights reserved. |
Privacy Statement
|
Terms & Conditions
The Society of Critical Care Medicine, SCCM, and Critical Care Congress are registered trademarks of the Society of Critical Care Medicine.
×
Please select your language
1
English