false
Catalog
SCCM Resource Library
Go Big or Go Home: An Overview of Research Using C ...
Go Big or Go Home: An Overview of Research Using Critical Care Databases
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
Hello and welcome to today's webcast, Go Big or Go Home, an overview of research using critical care databases. Today's webcast is brought to you by Discovery, the Critical Care Research Network at SCCM in collaboration with Clinical Pharmacology and Pharmacy Section. My name is Emily Owen, I'm a Critical Care Clinical Pharmacy Specialist in the Surgical Burn and Trauma ICU at Barnes-Jewish Hospital in St. Louis, Missouri. I will be moderating today's webcast. I have no disclosures. This webcast is being recorded. There is no CE associated with this educational program, however, there will be an evaluation sent at the conclusion of this program. Your opinions and feedback are important to us as we plan and develop future educational offerings. Please take 5-10 minutes to complete the evaluation. Thank you for joining us. A few housekeeping items before we get started. There will be a Q&A at the conclusion of the presentation. To submit questions throughout the presentation, type into the questions box located on your control panel. This webcast also includes interactive polling questions that will appear on your screen during the presentation. Please click on the radio button next to the answer choice to pick your response. And now I'd like to introduce our speaker for today, Dr. Omar Bedouy. Dr. Bedouy is the Director of Medical Device Safety for National Evaluation System for Health Technology Coordinating Center, where his primary role is in the development of an active surveillance program for medical device safety. Prior to NIST, he led the research for developing product-related predictive algorithms and decision support tools for acute care telehealth systems at Philips Healthcare and was the program manager for the Philips EICU Research Institute. He is also an adjunct assistant professor at the University of Maryland School of Pharmacy and research affiliate at the Massachusetts Institute of Technology. And now I'll turn things over to our presenter, Dr. Bedouy. Please take it away. Thank you, Dr. Owen. Appreciate it. I'll get started here with a couple of quick disclosures. I currently don't have any conflicts of interest. However, as Dr. Owen mentioned, recently I was an employee of Philips Healthcare, and there I was the program manager for the EICU Research Institute, and we'll hear a bit about that data during this session, and those are my affiliations already mentioned. So quickly going over the objectives today, at the end of this, I'd like all of you to be able to describe advantages and disadvantages of available critical care databases, feel comfortable generating valid study design for a given database, and understand some of the analytical challenges with secondary data and common approaches to handling those. So first I want to start talking about secondary data for research, and really that's the focus. With all these databases, you have to remember that the data originated from clinical practice. It was not, it's what you are doing every day in your ICUs, and those records of what you're documenting, and which good, bad, and the ugly of how things are charted, is what ends up being used for secondary data. And so it's really important to keep that in mind as you're looking at that data and using it. But it allows for a unique type of cohort study, which isn't really accurately reflected by the terms prospective or retrospective. I'm not one to really get into semantics, but you will see people feel, you know, strongly about some of these terms, and I just want to point out where some of the ambiguity lays, and, you know, the retrospective studies, but the collection of the data is not really subject to recall bias, which is often what's considered a characteristic of retrospective studies, because you're really kind of prospectively collecting that data during care. And on top of that, when people usually refer to cohort, cohort studies, they're talking about groups of patients that are followed over time, and in these ICU databases, it's really kind of a unique cohort. We follow them over time, but generally a pretty short amount of time in their ICU or through their hospitalization, and then patients enter and leave that cohort continuously as they go home and there are other patients coming in. So it really is kind of a unique type of study group, and I think some of the verbiage used in other studies don't apply exactly very well for us. Some other unique issues, think about what I touched on earlier, is that, you know, in traditional observational studies, the data collection process is designed a priori. It's fairly standardized, generally well-documented and understood. During routine clinical care, there's very little standardization, virtually no documentation, and the next two are really, really important, can't stress this enough. The process may change over time, either gradually or suddenly, of how things are done, and that may be because of clinical care, it may be documentation practices where care is still the same, but just the way they use their EHR system changes. There's a lot of potential things that can happen that aren't obvious, and so you really should never assume that you have a full understanding of the data you're using, and really advise for a lot of humility and double-checking, triple-checking as much as possible when you're doing this. Now we talk about these big databases for critical care, and it's often referred to as big data, and I bring this up that there are a lot of different ways big data is described. One of these is through, they refer to the Vs of big data, and here's kind of a sample of many of those. This doesn't quite capture exactly what we're working with with these databases, but they are still large databases, and they meet some of these. For example, when we're talking about volume, obviously these are big databases. Velocity is something that is common in these, where we have a lot of different data types that might be brought together. The velocity is something that you won't really see here, right? The speed in which data is accessible, this normally when you're talking about big data, you might be talking about really high-frequency data that's near real-time. That's not what we're seeing here, right? Just keep that in mind, another little subtlety there. Now to get into it, what we're really going to focus on are some of the open critical care databases. I put open in quotes because there are still some rules around using them, but basically these are the most common ones out there. I'll start with the MIMIC database, and these links or web addresses, I'll leave it on the slide for a minute if any of you want to copy that down. But MIMIC database I'll talk about comes out of Beth Israel Hospital in Boston, and they're really kind of led the charge with this. EICU Collaborative Research Database I'll go into next. Then the newest of those is the Amsterdam UMC database. I won't really get into the HyRID database, but I wanted to mention it here because it's also a critical care database. It was a little more focused in terms of its development. It was really around a specific project for predicting circulatory failure, but it does have some high resolution ICU data from about 34,000 admissions from an ICU department in Bern, Switzerland. So that could also be useful. And then really the group that does MIMIC hosts at the Physionet site. There are, if any of you go there, you will find there are quite a few databases out there for a variety of different domains that you may find useful. So that's a great place to visit and take a look at and explore. So quick history of MIMIC. Really started back in the late 90s with George Moody and Roger Mark from MIT who developed this multi-parameter intelligent monitoring for intensive care database. And it was really a pilot study using Beth Israel Deaconess Medical Center and seeing what they could do in terms of de-identifying and making some of that data in the ICU usable for research, obviously way ahead of their time. That ended up launching into what became the future versions of MIMIC where they established and received funding primarily through the NIH from 2001 on. And that group, the Laboratory of Computational Physiology at MIT has been very active in keeping this initiative going. I actually bring this tweet up from Alistair Johnson who has recently left the lab after working there for years. And one of the big initiatives he worked on with the rest of the team is releasing emergency department data from Beth Israel. So you can see this tweet was actually from just less than a couple of weeks ago that they released about 450,000 emergency department stays from that hospital that you can actually link with the hospital and ICU data from MIMIC-IV. So really great resource, a lot of information there, and this has really been the most widely used database in all of, I think, critical care research. This is a reference to the paper describing the MIMIC-III database, which was obviously the third iteration update, and this was all focused on ICU patients. MIMIC-IV has some of these expansions that we talk about. But this gives you an idea of how the databases can be a bit complicated for your typical clinician researcher that may not have kind of the data science skills. You have hospital data coming from multiple different types of ICUs. There's bedside monitoring data, vital signs, waveforms, alarms. You have the charting with fluids, medications, progress notes, test orders, billing, et cetera. So all these different tables of data, and then they can actually link with some other things like Social Security death index if they want to try to identify long-term mortality in some patients. And then that turns into their data archive, which then goes under de-identification process, date shifting, and converted to make the MIMIC database. I highlight here date shifting because this challenge of de-identification and what to do with that is something you'll find addressed in different ways in different databases. So what MIMIC has done is it's done date shifting, so they have a way of basically pushing dates randomly into the future and allowing to retain seasonality and also kind of context between one stay and another. But you also lose the fact that you don't know when you're looking at particular data, maybe exactly what year it is, and being able to tie that back with other things happening externally. I don't want to belabor just statistics here, but you can see some information about the population, almost 50,000 admissions, about 53,000 ICU stays that represent about 38,600 unique patients. All this data, excuse me, is published in an open access paper by the group at MIT, and you can see some of the information there. Key thing maybe to look at that's interesting, ICU mortality, as we compare with some of the other databases, 8.5% and 11.5% hospital mortality. As we look at the data structure for MIMIC, as I kind of alluded to earlier, there's five modules. There's the core, which has all the patient stay information. There's the hospital level data for patients. There's the ICU level data, which is really identical in structure to MIMIC3, as I mentioned earlier. They've added the emergency department data. Really great addition as well is this chest X-ray information that was brought in, I believe, about a year or so ago. So now you actually have imaging data that can be used and linked. Then there's asterisks around the note data because they actually have free text clinical notes from those patients that have run through a de-identification script. But as many of you would probably guess, that is very challenging process and not perfect. And so access to that is a little more restricted. And so there's a separate process in terms of being able to go through IRB or some secondary review for access to that. I should note for MIMIC and EICU, which I'm going to go into next, both of these are managed under the same PhysioNet and MIMIC access agreement. So you get credentialed to become a user and be able to access the data. You have to go through a simple human subjects research certification online, which many of you may have already done for other reasons. Provide that and detail out and sign a data use agreement that you will not try to re-identify any patients. If you find any patient identifiers, you'll report it back and destroy any data you might have. And that you will also share your code back with the community if you're going to publish things. And I'll get into a bit of that later. So some of the strengths and weaknesses about MIMIC. Strengths, as I mentioned earlier, it's been around a long time, and there's a really well-developed set of documentation and code base. Like I said, literally thousands of publications out there on MIMIC over the years that are very helpful. They do have waveform data, which, you know, isn't for all patients at all times, right? But there are episodes when waveforms are recorded for patients, and those are archived and available for analysis. So if you're really looking for that, this is a great place to get that type of information. As I mentioned, the Lab of Computational Physiology at MIT is a dedicated team for maintaining MIMIC with some of these expanding data sources. Very large set of reproducible code that's shared from others. And again, I'll talk more about that process later. And like I said, you could potentially get access to caregiver notes. Some of the weaknesses, it is only a single center, right? It does represent a very large, prestigious academic center in the Northeast U.S., which is very important, but also maybe not representative of everything going on in health care. Somewhat smaller sample size compared to what we'll see with what's available from the EICU dataset, and diagnoses in sort of typical fashion are available as they're documented and finalized at the end of the stay. So they're not really time delimited. Now, if we go into the EICU Collaborative Research Database, I'll talk a bit about history there. I was part of the group that helped form the EICU Research Institute back in 2008, and it really was part of this telecritical care network that was already sharing data for quality performance benchmarking. Fairly limited access externally. If you were within that research network, as often you'll see with these research networks, then there was an easier path for data sharing outside of that. There was always a path, but a little bit more limited. What we did is partnered with MIT to release a subset of that data. There's probably, my guess now, is at least six million patients in the full EICU Research Institute database. Obviously much less than that back in 2015, 2016. And so about 200,000 patients were released for public use alongside MIMIC. And like I said earlier, aligned with MIMIC for data use agreement, so they're basically under the same process and could streamline that. If you want to look a little bit about the data there, the population, you'll see not too different in a lot of ways. Fairly diverse population. Maybe, well, certainly a lower ICU mortality status, about 5.4% on average, and about 9% hospital mortality. But again, this covers a wide range of hospitals. And what you can see here, so this is an example of some of the nice things you get with crowdsourcing and some of the other groups, is the MIT team put this together using something called SchemaSpy. And if you go to their website, you see a GitHub repository, which is, for those of you not familiar, GitHub basically has other uses, but it's a method for collaborating in data science and research, sharing code, version controlling, et cetera. And so most data scientists will share their code on research projects they do through their own GitHub repositories, and they can make them private or public. And so the MIT team has a lot of public repositories that are accessible. And with this one, you can basically scan without having to have any real data science skills. You can go in and look at the SchemaSpy, and it starts telling you about some of the tables in there. And you can actually look at the relationships and dive into some of these tables for more detail. But you can see here, 208 unique hospitals in the database, so wide variety. There's some very small critical access hospitals, some very large tertiary centers, very big spread. And one thing you might notice is there's these children and parents column. The patient table here is clearly the parent table, and the other 29 tables are all children. And you can see, so these all tie back to the patient. Couple interesting things here that really stand out. Apache is part of the EIC program. Most sites are receiving Apache 4 and 4a predictions, and those are brought into this database. So essentially, you do have a lot of that very rich risk adjustment through Apache available in this database without extra work needed. And then the other really nice thing here is this vital science table, especially the vital periodic table. And what periodic data means is that it's routinely coming through. And so there's an interface with the bedside monitors where vital signs are coming through on one-minute averages for every patient, and then they're archived as five-minute medians. So what you see here is for most patients, you'll have every five minutes throughout their stay their vital sign records. So almost certainly that covers their heart rate, their respiratory rate, their SAO2. And then if they have an A-line, you would also have their blood pressure data. If not, you might have more of the intermittent data that you would get through A-periodic vital signs. This is a little bit of how that data is structured. The unique thing about the EICU database is everything is centered on this patient unit stay ID. And so every time a patient is admitted to an ICU, they get a new unit stay ID. And that becomes that primary key. And what primary key means is exactly what I was showing in that schema spy image. That is how you link to all the other tables. So if I go back and I want to find out the admission, excuse me, diagnosis of a patient, the way I find that out is I will use that patient unit stay ID to go to the admission diagnosis table and then find the admission diagnosis for that patient. For each patient who might have multiple unit stays and different IDs for their patient unit stay IDs, they'll have one health system stay ID, which is generated per hospitalization. So again, you come to the hospital, you get an ID, you go to an ICU, you get another ID, you transfer to a different ICU, you get another ID. That's inherent in the system. One of the things the group with MIT did was also put an algorithm together to create a unique patient ID that makes an attempt at linking multiple hospitalizations together. This is, we tested this. This works pretty well, but it's not flawless and doesn't necessarily cover when they get admitted to different hospitals. But it may help in some cases where you're trying to find a patient who's had multiple hospitalizations. And if you look at the paper we published on this describing it, this is kind of a description and an image of what you might be able to piece together on a patient. And you can see, you can really get a timeline here of a patient's vital signs, their labs. Obviously, there's a whole lot more. There's 29 tables there. So a lot of information is brought in, and you can track that over time and use for your research. Really important thing, remember I talked about date shifting and MIMIC. Well, in the EICU database, they don't do date shifting. What is done is the de-identification process converts each ICU admission into time zero. And so what happens is all of your data in the database will actually be recorded in minutes from that unit stays admission. Which means, let's say you have a hemoglobin level drawn 60 minutes into your ICU stay. Well, for that patient and that patient's ID, you would have at time 60 this hemoglobin of 8.2. Now, let's say they leave the ICU and they go to a different ICU. And then two days later, they have another hemoglobin, or let's say they're admitted two days later, but 60 minutes into that stay, they have another hemoglobin drawn. That hemoglobin will have a time offset of 60 minutes as well. But you would have to sequence them to see that they actually happened completely different stays. So everything's organized around a unit stay, and it does take a little bit of work to make sure that when you're looking at data across a hospitalization, you've lined it up correctly across the stays. It can easily be done because with each patient in the patient table, you know when the hospital admission happened. And so with just some simple math, you can align everything. But it is a little bit of work that needs to be done and that can sometimes trip people up. So some of the strengths and weaknesses. It's a pretty large sample. Like I said, about 200,000 patients. It's very heterogeneous, which in general, we like to think will make things a little more generalizable. The periodic vital signs are quite valuable just to be able to track at that resolution. The data scheme is standardized across all the hospitals using it. You do get that Apache risk adjustment, which very nicely comes with a primary reason for admission that's clearly documented in most patients. So that's hard to get in your typical EHR, but because Apache's used in these patients, somebody went through the trouble to say this was the primary reason they're here, and that can be very useful for research. You know, there's also some structured care plan data. Data before and after the ICU stay can still come in through interfaces. So labs that happen while they're in the general wards will come in. And when they're documenting on the problem list and diagnoses, those are time-delimited. So they can be mapped to ICDs, but you'll know when those were charted, with the caveat that there may be some quality issues in terms of sensitivity of documenting those all thoroughly that I'll get into a little bit. And then, like I alluded to, there is a larger database that there is some potential to potentially collaborate and work on that larger database for particular studies when needed. Weaknesses, it's newer, although not that new, so more code is becoming available, but certainly less than MIMIC. Big thing, variation in documentation practice. The remote telecritical care systems are secondary to the EHRs, right? That's not where the primary documentation happens. And so what you do find is there's quite a bit of variation in what is charted and how reliably that is. So as I alluded to with, say, a diagnosis or problem list, some staff at different EICUs may be very reliable about making sure those are all put in, and others, they don't bother, and others, it's sort of haphazard and in-between. So you can't just blindly assume that the data there is really all useful, but it's also, I think, unfair to assume that none of it's useful either, but it takes some work to get to what really is usable, and you've got to be very careful about that. Not all EMR data is available. Specifically, imaging data, radiology, cultures and sensitivities is probably the biggest gap if you want to look at microbiology information. There's very little of it there. Okay, I'm going to pause for a sip of water real quick, and then jump into the Amsterdam UMC database. Okay, so this is the newest database, and it's great. Actually, just this month, again, out of the presses, there is a great paper in critical care medicine covering this database and how it was put together, and so really recommend everybody reading that and taking a look at this database. So it was released last year. It comes from the Amsterdam University Medical Center. So, again, it's a single center, but they covered quite a long time span, 2003 to 2016, and they really put a lot of effort to adhere to very strict GDPR, which is European standards around privacy that came out, I believe, in 2019 that make data sharing and research quite complicated, but they went to great lengths to make sure they met that standard in addition to meeting HIPAA standards, and they also put some effort to make sure their model was simple, simplify the underlying data. So a little high-level view here. You know, they've got a variety of databases. They're EHR. They've got some other access databases, and they put that into a data lake, do some de-identification, and in the end, they end up with this modified, you know, seven tables that are available for analysis, and one of the key things they did, which also was something done in the EICU database, is some denormalization, and what that means is when you use your EHR or some of the databases, when you're using those in clinical practice, the way data's stored there is really for efficiency of operating and taking care of patients, so you would not see a table structured like this where it says heart rate in their vital sign table. What they would have is an item ID which says 6640, and then 6640 correlates with heart rate, and there would be another table that tells you 6640 is heart rate, and 6642 is ABP systolic, so what they did is they said, well, you know, this is going to be really hard for people to use because they pull out this table, and all you do is you see a bunch of codes, and so we're going to join that and bring in the text so this is more readable for users, and you can see here in this table, there is for this particular patient at time zero, there was a heart rate of 65, and also at time zero, there was this systolic of 66, and so then you have 1 o'clock and 2 o'clock, and you can see how this is organized, so a little bit easier because the data's here, but you can also see that, you know, when you're dealing with detailed time series data, this can get quite cumbersome for a typical person to use who's not maybe a little more savvy in terms of managing data and databases, so this is why we often partner with database engineers and other researchers who have expertise in this to help manipulate the data and get it sort of in a format where we can analyze. Excuse me. Skipped over that. And this is also a slide. I won't spend too much time on it. It's also in the paper, but it gives you an idea of what data's available and by how much you can see here at the top. If you can't, heart rate, SpO2, nearly 100% of patients have those during admission, all the way down to, you know, temperature, cardiac output, intracranial pressure. These things are much less frequently available, which shouldn't surprise anybody, really. Same thing here with some of the things documented, chest strain, pupillary size. Those things are going to be less common than urine output and heart rhythm. You do see in about 30% to 40% of patients, you'll have APACHE diagnoses, so that's helpful, and nursing activity score, and it looks like about 25% of patients, also helpful. And then, as usual, labs. Actually surprising, quite a bit of lactate in that era. I don't think we see the same rate of lactate being drawn in U.S. hospitals, but, obviously, that's been changing over time. And then these are some of the medications, especially infusions, that you can see charted, and, obviously, they're very common to use, propofol, morphine, norepinephrine, much less likely to use cefotaxime and haloperidol. And, again, this is just an image to show how you can piece that data together and visualize, you know, a patient's trends in vitals and superimpose that over some of their labs and cardiac output with nursing assessments, medication administrations, fluid INOs, and some other interventions like mechanical ventilation, CBVH, Foley catheter use, so quite a bit of information that you can piece together and use. Again, this doesn't happen very easily, but, you know, if you go through and there's code shared on how to do these things, you can take shortcuts and get there. So a little bit about their population, about 20,000 unique patients. One thing that's nice here is they have these high-dependency units, or, you know, we might refer to as step-down units, which you don't see in the EICU database. That's available public. And I don't believe there's many of these in the MMIC database, or at least there wasn't a MMIC-3. And then you can also see there's another feature here that's quite helpful, which is long-term mortality, death less than a year after discharge. So that's something you can't get very easily in some of the other databases. So that's one of the strengths. It does span many years. It's really the first major database to provide a European perspective. It has the high-dependency units. Physiology can be up to one-minute resolution, which is really helpful. And it's a little simpler structure, only about seven tables. Weaknesses, relatively new. You know, just really came out. Again, it's single center, but, you know, still quite important and don't want to knock that. One thing I will note, probably because I'm a pharmacist, but they do have a stipulation in their access that an intensivist is required for access. So something that could be a little bit of a roadblock if some other disciplines are trying to do research. But then again, they don't want to knock this too much because I think we're all for collaborative research and having a broad team. And hopefully there would be intensivists and pharmacists and other professionals from other disciplines in teams, and you have a broad team together. But if you don't, that can be a little bit problematic. Smaller sample size and mimic in EICU. Overall, still, I think this is a great addition that people have really been excited about. So now let's take a quick pause, and I'm going to ask a quick poll question or two. What database would you choose for some of these study themes? So these aren't exactly studies, but I'm going to go one at a time. First one is if you want to look at association of specific medications with QT prolongation. And I think the poll will show up now, and I'll give you guys a minute to vote. Okay, so, interesting. So, we have here for association of specific medications with QT prolongation, we had about 52% say they would prefer the EICU database for that, which is a little bit interesting. I think one of the values of the EICU database is that it has the five-minute medians for vital signs. However, it doesn't have waveforms or some of the waveform or bedside monitoring alarms that MMIC does. So, in this case, actually, MMIC would be a better choice for that because there you can actually look directly at some of the waveforms and get QT prolongation. All right, next question. So, that was good. I stumped some of you. If you want to look at variation in likelihood to receive invasive versus non-invasive ventilation across hospitals. Okay, good. So, on this one, we have the EICU collaborative research database, 91%, and that's great because that is the database that covers over 200 hospitals, and you can really look at patterns of care over different hospitals. Obviously, if you wanted to look at all three of these, you could compare with MIMIC and Amsterdam, but in one database, the EICU one would be the best choice there. All right, and next question is changes in sedation practice over a five-year period. All right, so here, all right, that's good. The EICU would not be a great choice here, especially if we're talking specifically the Collaborative Research Database, which only covers a two-year span. MIMIC does have a longer history, so that may be an option. It could get a little bit complex with some of the older versions and putting them together, but the Amsterdam UMC database clearly spans 2003 to 2016, and that's in one database, so that one would be a good choice. And let's see what we have next. Okay, Physiologic Response to Fluid Bolus. All right, let's see what the choices are. Okay, this is actually not too bad. So physiologic response to fluid bolus, you can actually do these in any of these databases. I would think that MIMIC may be a little more challenging because, in general, the resolution of the physiology is not as high as the Amsterdam and the EICU database, but there may be some choices there. So I think, in general, if you're looking sort of minute to minute, you know, every five minutes, every minute, EICU and Amsterdam will give you more information there. Also, the one I alluded to early, the HyRID database may be useful as well. So that may be another one to think about. Just one more question. Rates of de-escalation of antimicrobial therapy based on culture findings. OK, and what do we have? Good, so definitely Mimic is a very good choice for this. EICU is definitely not a good choice for this. That one does not have culture data in any sort of robust way. The Amsterdam database, I believe, may have some. To be honest, it may take some digging because they have tables with the lab values, and some of those may come through under the free text. If you go through their schema, it's not explicit what's in there. So that may be one that's worth investigating and verifying. But certainly Mimic and likely Amsterdam would be good choices there. OK, great, so now move on to some of the key principles to remember. As you can figure out, there really is no best database. You really need to think about what's good for your study. The process is very iterative. This isn't like a prospective randomized trial. You can't plan for things you don't know. You really need a diverse research team. You don't have to do everything yourself. I think a lot of people are intimidated by the complexity of this data. You don't have to be a SQL guru to do this yourself. And I'm going to talk a little bit about this start simple, broad, and shallow versus narrow and deep. And I think we're running a little bit low on time, so I'll go quickly through here. But there's a great book, Secondary Analysis of Electronic Health Records. It's open access. And there's a chapter on how to do this that any of you can go access and read. But I think we talk a lot about feasibility. And what I talk about here is two approaches, broad and shallow, which is, OK, I want to look at this particular topic, and I don't even know if they exist in the database or not. Do we have patients with those diagnoses? Or have they ever been exposed to that medication? You don't even know if that exists, then you really just want to do a cursory look and confirm the data is available. On the other hand, when you're really looking at your research process and your operations and how you're going to do it, you probably want to do something more narrow and deep. And I'll give you an example of this. This is really taken from Agile methodology that we use in software development. And in the old way of doing things in software, they call it waterfall, but this is also how you might design your prospective trials. You start off by doing all your background work, you design your trial, you go implement it, you run it. Now here, this is a matter of coding. So you've designed what it is you want to code and query, you go and do that, you test to make sure you do it right, and then you're done. In Agile, you break it into vertical slices. And so I think the best way I can articulate this is when I'm building a predictive model, and let's say if you were doing it all up front, you might have identified 20 variables that you want to put in your model. And then you would do all this analysis and design and do everything all at once. Well, in Agile, you might start and say, I'm just going to start with one variable. And I know my model is not going to be valuable when all I have is heart rate in it, predicting mortality. But it's going to help me learn how to manage the database and how to query it and how to align my outcomes and how to put a QA process to make sure we're coding correctly and make sure we know how to do some exploratory data analysis and look for bad data. And how are we going to report the results of our model? Let's kind of standardize our metrics. And so you can go through that, and you can basically build an entire process. And at the end, you basically have a completely documented and thorough process for a model that just has heart rate in it. And the value there, again, is that you've kind of worked through your logistics, your mechanics, and now you make kind of an automated process. And then you can start layering in and get to your final model. And that helps you iterate and learn as you go along. Whereas if you put this huge amount of design in, and then you find out once you start coding that half your variables don't exist or they're too dirty and messy to do anything with, you've wasted a whole bunch of time. I'll move on from there because I know we're running short on time. So we do have another poll. Let's imagine you're a data analyst at a hospital. You're doing a study on ICU readmissions and using a database from the ICU clinical system. Your team wants to report what readmission rate was last year and if it differs by ICU. So you look in the database, and you find a field, binary field, called ICU readmission. It's just a checkbox, yes or no. Can you use this for your study? And what do you think the flag means? So one patient was discharged from an ICU to a lower acuity location, followed by going back to the ICU. They went from an ICU to lower acuity location, back to the ICU, but only if 24 hours passed. Basically the same thing, but only if less than 48 hours passed, which I think is kind of a clinical definition that a lot of us use for readmission. Or they were in the ICU and basically their level of care was downgraded to step-down status, or there's not enough information. So I'll give you a few more seconds here to quickly vote. All right, great. So most people say there's not enough information, which is the right answer here, because these things actually vary depending on which database you're using. In one data set, it A, makes sense, that's what a readmission is, but in the MMIC database, you actually don't get a new unit stay ID unless more than 24 hours pass. So it wouldn't have that flag inherently unless it was B. A lot of people will use 48 hours as a clinical definition, but that's not likely what's used in that flag. And interestingly, in the EICU database, simply changing status from ICU to step-down initiates a new unit stay in the database and can look like a readmission. So all these things that you never would have thought of, and if you just blindly use what's available, you're gonna get misled and get to wrong results. So you actually spend quite a lot of time in designing these features to make sure you have them right. And it's not so much that your rules will be wrong, it's that maybe your logical criteria breaks down in the face of the real data. And so occasionally there's just errors in data where somebody was discharged a couple of minutes before they arrived, which obviously can't happen, but there's some errors in the database and then your rule for readmission would completely break down. All right, let's jump to the next slide. All right, so quickly, some issues that you really need to think about, survival bias. Very common observational datasets. You have to think about who made it to your cohort and are they representative of the outcome you're looking at. Confounding by indication, often in the ICU, if you just look at more patients die on ventilators than without ventilators, obviously there's confounding there because there was an indication to get intubated versus not. Measurement error is very common and that's really common everywhere. I think there's a bit of a overreaction that can't be used. I think in every study, you really need to think about measurement error and use your clinical judgment on how to account for that. Same thing with missing data. This is really tricky. When you interpret gaps, something's every two hours, is it missing? Every six hours, if you don't have a blood gas and you don't have a pH, what does that mean? There's no single answer to this, but I think anything you choose is gonna require a lot of assumptions and you're gonna wanna test those assumptions with sensitivity analyses and you're gonna want to have your clinicians and your data scientists and everybody working very closely on how to interpret these things. And lastly, we'll get a couple minutes in for a question, is crowdsourcing, share code, this really accelerates innovation research. There are two great papers that just came out and some examples of this. This RICU package is really cool where they basically did this project to allow for common queries between EICU and MIMIC database. And this FIDL project, basically pre-processed MIMIC and EICU for some outcomes like in-hospital mortality, respiratory failure and shock. So you can sort of leverage things other people have done so you don't have to spend all the time to do it. So really big advocate of sharing code and crowdsourcing. So with that, I'll pause for a couple minutes of questions. Thank you, Dr. Badawi. Can you speak a little bit to the type and the quality of medication data that's available within these different databases? Yeah, that's a great question. So with the medication data, it varies a bit. In the EICU database, there's two main places of getting medication. There's one which is the pharmacy orders, and those are in a medications table. And what they record there is actually in, for those of you that are international, in the US, every medication order has to be verified by a pharmacist before it's official. And so once that's verified, that electronic message is captured with a start and stop time of that medication. And so you'll have that record in the system. Now, you would know, Dr. Owen, that just having an order for a medication doesn't mean it was given, and just because it says give at a certain time doesn't mean it was given at that time. So they do not have a meds administration record. So you might be able to make an assumption that some meds were given on the routine they were scheduled to be given, but you certainly wouldn't be able to make that assumption for PRN or as-needed medications. Now, all the databases do have continuous infusion medications as well that you can track, and those would reflect the nurse charting at the bedside and the actual infusions and titrations. So those should be fairly accurate, with the caveat that in the EICU database, not every site interfaces or has that data directly documented or very reliable. So there's a subset of sites where that's very reliable and others where it's not. Thank you. We have a question that asks, are you aware of pediatric ICU databases? There are, and the name is escaping me. I do know there are some pediatric ICU databases. I have not worked with them directly myself, but I do know that there are some, and they may not be as clinically rich. They may be more of the registry type, but we can certainly follow up. I do have some colleagues that have worked with some of the pediatric databases. Thank you. Is it difficult to query the databases for data points that are available? That depends. So it might be difficult initially if you don't have any background in it, but I think you can start simple. Some of the tables are very easily organized and you can kind of work through them, like looking at admission diagnosis and demographics. Things that get more complicated might be the labs. Now you have just this really overwhelming amount of data or nurse charting or infusions. The organization of those tables can be quite complex. But if you're just scanning for a few things and trying to find medications or trying to find diagnoses, it's not too challenging. And like I said, there is a lot of shared code. So if you can get a little familiar with some of those tools, a lot of the questions you might want to ask in terms of screening through the database might have already been done and you may be able to leverage that. Fantastic. And then we have one final question. Is the EICU database the only one with nursing assessment data? I don't believe so. You should have nursing assessment data in the MMIC database and the Amsterdam database. And in the EICU database, the nursing assessment, again, the reliability of that may vary by ICU as it may not be thoroughly documented in each system. They kind of have different implementations that may affect the reliability. Fantastic. Well, that concludes our question-and-answer session. Thank you, Dr. Budwe. Thank you to our presenter and to our audience for attending. Again, you will receive a follow-up email with a link to complete an evaluation. There is no CE associated with this educational program. However, your opinions and feedback are important to us as we plan and develop future educational offerings. Please take 5 to 10 minutes to complete the evaluation. This concludes our presentation today.
Video Summary
In this webcast, Dr. Omar Badawi provides an overview of research using critical care databases. He discusses three main databases: MIMIC, EICU, and Amsterdam UMC Database. MIMIC is a widely used database originating from Beth Israel Hospital in Boston and contains data from multiple ICUs. EICU is a collaborative research database that includes data from over 200 hospitals. Amsterdam UMC Database is a newer database that is GDPR compliant and covers a wide range of years. Dr. Badawi highlights the strengths and weaknesses of each database, such as data availability, patient population, and measurement error. He also emphasizes the importance of considering issues like survival bias, confounding by indication, and missing data when analyzing the data. Dr. Badawi recommends taking a broad and shallow approach initially to understand the data structure and then digging deeper for specific research questions. He also advocates for sharing code and collaborating with other researchers to accelerate research and innovation. Overall, the webcast provides valuable insights into using critical care databases for research purposes.
Asset Subtitle
Research, Quality and Patient Safety, 2021
Asset Caption
Large critical care databases are potentially powerful tools for generating real-world evidence to guide the care of critically ill patients. Leveraging these resources requires an understanding of the unique challenges and sources of bias that are commonly encountered. This webcast from Discovery, the Critical Care Research Network, will provide an overview of the available critical care databases and discuss key methodology concerns when using a database to answer your research question.
Meta Tag
Content Type
Webcast
Knowledge Area
Research
Knowledge Area
Quality and Patient Safety
Knowledge Level
Intermediate
Knowledge Level
Advanced
Membership Level
Select
Membership Level
Professional
Membership Level
Associate
Tag
Research
Tag
Evidence Based Medicine
Year
2021
Keywords
webcast
critical care databases
MIMIC
EICU
Amsterdam UMC Database
data availability
patient population
measurement error
survival bias
collaboration
Society of Critical Care Medicine
500 Midway Drive
Mount Prospect,
IL 60056 USA
Phone: +1 847 827-6888
Fax: +1 847 439-7226
Email:
support@sccm.org
Contact Us
About SCCM
Newsroom
Advertising & Sponsorship
DONATE
MySCCM
LearnICU
Patients & Families
Surviving Sepsis Campaign
Critical Care Societies Collaborative
GET OUR NEWSLETTER
© Society of Critical Care Medicine. All rights reserved. |
Privacy Statement
|
Terms & Conditions
The Society of Critical Care Medicine, SCCM, and Critical Care Congress are registered trademarks of the Society of Critical Care Medicine.
×
Please select your language
1
English