false
Catalog
SCCM Resource Library
The Edge Tool: A Practical Approach to Data Harmon ...
The Edge Tool: A Practical Approach to Data Harmonization
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
I have a couple of disclosures. I had a grant from Pfizer, I'm a consultant for the Critical Path Institute, and Emory University Morningside Center for Drug Repurposing. So I should give you a little bit of background about myself. I have been a data analyst, well, you're saying data scientist. I've been a data scientist since before they called us data scientists, and they just called us nerds. About 30 years. And I made my living helping researchers, clinicians like yourselves, put together your databases, analyze your data, do biostatistics, and so on. And then in my 40s, I had a child with rare disease, and decided I wanted to do something with real world data. And so I went back to school, and that's why I'm a very old assistant professor. But I still feel very passionately about supporting these projects where these brilliant clinicians want to realize their vision, but they're using new tools. And so I play a very small and humble role in these projects, where I help to develop and teach the tools that we use. And one thing I wanted to mention is this edge tool that was mentioned in the title of my talk. That is really kind of a way that we described a bundle of mostly ODYSSEY and OMOP tools. You may have heard of ODYSSEY and OMOP. ODYSSEY is the Observational Health Data Science and Informatics Initiative, and OMOP is the name of the original project, Observational Medical Outcomes Partnership. But now it's come to be known as the Common Data Model that is used by the ODYSSEY community. And so, sorry. So the pathway to real world evidence is a squiggly one, as you saw. So I've been a knitter for 45 years. And I want to give you this analogy to help you get your mind around what I'm asking you to do with your real world data. So in knitting, you knit with two sticks. And I can make anything with knitting. I can make a sweater, I can make a hat, and in my community, in my family, if somebody wants something made, they contact me and say, Danielle, I want to make this sweater. Can you do it for me? So I decided I was going to take up crochet last winter. You make sweaters and hats, but you use a hook. You still use yarn, make the same thing. But it was very different and very difficult. Different stitches, different language, different communities that I had to join in order to understand more about crochet. And it was very difficult and humbling. And this is a lot how I think about the work that we do now with real world data and real world evidence, with the software that I'm asking you to use. So I come in with all of this jargon and these new tools, and you've been doing your research with SPSS and Stata and flat files. And now I'm asking you to use relational databases and GitHub and our shiny apps, right? And it doesn't take away from the fact that you're a knitter, right? You know how to knit. You know how to make sweaters. And it's very, very frustrating sometimes. And so my goal is to teach you crochet painlessly and then get you back to knitting. So this is a graphic that I borrowed from Patrick Ryan from the Odyssey community. This describes what we're doing here with our Odyssey tools and real world evidence. So think of these sources at the top as either databases from three different hospital systems. So it could be like Epic, Cerner, and then a different Epic implementation. Or it could be within your hospital system. It could be your Epic data, or Cerner, or VA data, and then your claims data, and then a RedCap database. But as you notice, these little boxes, these tables, and how they relate to one another are all different. And in source one, you might have mail as M. And in source two, you might have mail in a completely differently named table, and it might be O1. And in source three, it might be the word mail written out. And so I made my living for many years going in and understanding these different sources and helping them to talk to one another, writing proprietary code and software to pull them all together just so that you could analyze them together. So what we're doing here in this blue section, we're transforming the data into a common data model. That's the crochet part, and that's where my work lives. So we use tools and code, and sometimes a combination of things, different software, different code, whatever works for your institution and for your dataset, to transform. So it's called ETL, extract, transform, and load your data from those sources into this blue area. And you can see now they're all the same, which allows us to analyze all of them with the same code and the same software. And that's the brilliant thing about OMOP. So now you don't have to hire an analyst to decode all of these three different databases, figure out how they go together, make staging tables, combine them all. You don't have to do that. All you need is somebody who knows how to use Odyssey tools. And within moments, I can pull all of your data together and analyze it, and then share the results with others. And you know you're comparing apples to apples, because everything has been standardized. So this is the OMOP common data model. This is the scariest thing I'm going to show you today, I promise. But it's not nearly as scary as your source data schema, I guarantee you. So the blue side is your clinical data tables. And that's probably what you're, if you've worked in the OMOP common data model at all, you've probably focused on that. So OMOP is a person-centric model. And everything else that's collected about that person falls in one of these tables. Every single site that has an OMOP common data model instance is labeled exactly like this. So I can go into your site and figure out exactly what's going on and analyze your data instantly. And we can all share code. And we know that I'm analyzing the data on, let's say, diabetes, the same as you are. And we're using the same codes and references. And then we also collect data and transform data on economics. So the payer plan period collects insurance information, and there's drug cost, and so on. Not everybody ETLs all their data at once. Sometimes people, institutions might ETL their data incrementally based on certain use cases. Maybe they have funding. So what we found in our COVID work was that some people who already had OMOP instances had never ETLed their ventilator data before. It just never came up. And it's a lift. I'm not going to lie. It's a lot of work to do that. And so sometimes they do their devices later. That's okay. But what I want you to understand here is that every single site that has an OMOP instance looks exactly like this. So I can come in and analyze their data and provide them with code that I wrote that works on my OMOP CDM, and it'll work on yours as well. And I think this is really brilliant for lower resource settings because, like I said, I'm almost always in short supply. Everybody wants Danielle to analyze their data. There's always this long list, and this solves that problem. So as I mentioned, here's a tool that we use for the standardized concept vocabularies. So anyone can go to the Athena website and look up their favorite codes or favorite diseases. So if you have an ICD code that you use all the time, you can go in there and see how it's represented in the OMOP CDM. This is also where you download the vocabularies that you use in your own ETL process. But it's free. You just go to this website and download it. This is the Perseus ETL mapping support tool that bundles together a number of different Odyssey tools, and they're trying to just make everything simpler and easier to use. But basically, on the left, you have your source data. That's your source data. When we say source, we mean your REDCap database, your EPIC data or clarity data, CERN or so on. And then on the right is the target, which is your OMOP CDM. And so these tools help to support you to kind of guess and work out which of your source data should match to your target data. So the point is, there's a lot of really easy no-code ways of supporting sites who are doing this work. This is the classic USAGI manual mapping tool. I use this a lot with, let's say, like a REDCap database that doesn't use ICD codes or, you know, LOINC or HL7. And I use this. It helps me to work out what might this question actually represent in the OMOP CDM. And again, these are our shiny apps. You can download from GitHub. I know that sounds like I'm making it sound really easy, but it's free, and your technical group should know how to do it is the point, but these are all free tools. This is my favorite Odyssey tool, the Atlas tool. When we were talking about developing cohorts and inclusion and exclusion criteria, this is the tool that allows you to do that. So Atlas, again, is a free tool. Once your data are transformed, you point this at your own Odyssey instance, and you can develop very complex inclusion and exclusion criteria. And at a glance, you can see, for example, if an industry partner might come to you and say, I work a lot in rare epilepsy. I want to know how many people you have with Dravet syndrome who also take this drug. I can do that within seconds. And in the olden days, that would be something I would have in a long waiting list with a lot of people who wanted me to look that up, and I wouldn't always be certain I was actually capturing everything accurately because it hadn't been vetted through the ETL process like it is here for OMOP. So this is just an example of a really complex cohort definition. This is something that I developed just to show you how we develop these concept sets. So as I mentioned, I'm interested in rare epilepsy. So I might develop a concept set called infantile spasms diagnosis, and there might be a few different codes for that. So I'll make a concept set for that, and then a concept set for the different drugs that I want to capture or exclude. So then in my cohort definition, I say, just like you would if you were writing your protocol, I want to include everyone with infantile spasms diagnosis. And it bundles up all of those concepts that I had previously defined and puts them into my cohort definition. Then I can click on that export tab and get a JSON file that I can send to a colleague to get a different institution that has ATLAS and an OMOP CDM instant. And in seconds, they can tell me how many patients they have that match this criteria. And that is the basis of a network study. And no patient data needs to leave the building, which makes IRB really fast and efficient. So I was 16 years at Johns Hopkins, so I borrowed this slide from Paul Nagy, who you may know. This is the data quality dashboard, again, free, downloadable from GitHub. Don't be scared. And you point this at your electronic medical records, and it runs the con at all data quality checks. So there's a variety of checks on plausibility and completeness and conformance. And it will automatically scan your whole CDM and tell you where some of the issues might be. This comes up a lot when we have issues with discordant units that, you know, the plausibility of the units of measurement aren't making sense. And you can see here the kind of report that you get. And then you could send that to me, no patient data involved. You send this to me, and then I can look at it, you know, upload it into my app, and I can give you advice on what you need to do to go back and improve your ETL. Because believe me, there can be a lot of mistakes. You know, it's a human multidisciplinary process, but things slip through the cracks. So you want to do this as just one of your many checks of your data quality. And finally, I just wanted to provide you with some resources. You're always free to contact me. I love talking about them, and I love helping people to learn how to crochet. And so here's the Book of Odyssey. The Odyssey website is very helpful. Eden Academy is out of Europe, and they have a number of great classes that you can take and learn really in-depth how to do at your own ETL or use Atlas. And this is the page specifically for the Odyssey software tools. Thank you.
Video Summary
The speaker, a seasoned data scientist and assistant professor, shares her journey and passion for data analysis in healthcare, emphasizing real-world data integration using OMOP (Observational Medical Outcomes Partnership) tools. She draws an analogy between knitting and crocheting to illustrate the challenges of transitioning between familiar and new methodologies in data analysis. The focus of her work is on transforming disparate healthcare data into a standardized common data model using OMOP, facilitating easier data analysis across different health systems. She highlights several Odyssey tools that streamline this process and emphasizes collaboration and standardization.
Asset Caption
Two-Hour Concurrent Session | Curating and Analyzing Real-World Data for Critical Care Research in COVID-19 and Beyond
Meta Tag
Content Type
Presentation
Membership Level
Professional
Membership Level
Select
Year
2024
Keywords
data analysis
OMOP
healthcare
standardization
Odyssey tools
Society of Critical Care Medicine
500 Midway Drive
Mount Prospect,
IL 60056 USA
Phone: +1 847 827-6888
Fax: +1 847 439-7226
Email:
support@sccm.org
Contact Us
About SCCM
Newsroom
Advertising & Sponsorship
DONATE
MySCCM
LearnICU
Patients & Families
Surviving Sepsis Campaign
Critical Care Societies Collaborative
GET OUR NEWSLETTER
© Society of Critical Care Medicine. All rights reserved. |
Privacy Statement
|
Terms & Conditions
The Society of Critical Care Medicine, SCCM, and Critical Care Congress are registered trademarks of the Society of Critical Care Medicine.
×
Please select your language
1
English