false
Catalog
SCCM Resource Library
Development and Validation of the Phoenix Pediatri ...
Development and Validation of the Phoenix Pediatric Sepsis Criteria - 1
Back to course
[Please upgrade your browser to play this video content]
Video Transcription
All right, good morning. Thank you all for being here. So I'm going to talk about development and validation of the criteria. My only disclosure is that we were funded by NICHD. Thank you to them for their vision in supporting this, for this work. So concurrent publication today, as others have mentioned, I want to add, and these slides are available through SCCM, that in the bottom right of this slide is a link to the GitHub repository where the code for much of this work will be made publicly available. So I'm going to talk through the methods. Nelson will present the results, and then I'll talk through some of the early discussion issues. So we began with a conceptual framework that the task force decided in 2019 that we would adopt, that sepsis in kids, in fact, was infection with life-threatening organ dysfunction. I've always been a fan of this posting from the JAMA Twitter feed after sepsis 3, which shows that same thing. So the way that we operationalized this is as follows. So suspected infection in the first 24 hours. We were really trying to target in first, we'll come back to hospital-acquired later, first on kids who present with sepsis at the beginning of an encounter. Life-threatening, what that means is that the primary outcome for all of what we'll show you is in-hospital mortality or encounter mortality. Organ dysfunction. And so I'm going to show you how we identified the best-performing organ dysfunction subcomponents of all of the pieces of scores that we evaluated. Importantly, those components needed to be applicable in both higher and lower resource settings. So this is a complex slide that shows the methods. I'm going to break it apart. So step one, identify the best-performing organ dysfunction subcomponents of existing scores. And what that means is that best was identified as the subcomponents that best predicted mortality in infected versus non-infected patients. And this table on the right side of the slide shows the complete space of subcomponents of existing scores that we evaluated in order to do that. And there's some familiar, not names, but initials on that list. And so a subcomponent, one example is, OK, what is the best cardiovascular subcomponent taking into account or considering all of the available cardiovascular subcomponents that we might put into a sepsis model? So step two was to take those best components and build a sepsis model. And so we identified the best. We then used a form of machine learning called stacked regression. You may have heard of model averaging or ensemble learning. This is in that family of methods. And so we stacked those best available subcomponents using a top-level machine learning model. And that top-level model also predicted mortality, at this time, in kids with suspected infection. Using the best sepsis model, we then translated that best model into something that a human could use and not just a computer, so an integer-based score. The way we did that was using a grid search of the complete space of possible integer values of all the elements in those subcomponents, collapsing categories when there was no effect on overall performance. In step four, we selected binary thresholds from that integer score that would be used for the sepsis and septic shock criteria. And this was a modified Delphi process in partnership with the task force. Importantly, in the first three steps, we used as our primary metric of performance the area under the precision recall curve, or AUPRC. In clinical journals, you may be more familiar with the AUROC. And so the reason that we chose the AUPRC is as follows. So the y-axis is precision, or positive predictive values, you may have heard it referred to. The x-axis is recall, or sensitivity, as you may be more familiar with. The dotted red line is the baseline event rate across the bottom. So the AUPRC reflects the difference between the model and the baseline event rate. When you have imbalanced classes of data, and a balanced class would be something like an event rate of 50%, imbalance would be closer to zero or 100%. Thank goodness, death in kids with sepsis is much closer to zero than 50%. There's imbalance there. AUROC can overestimate performance in that setting. Therefore we use AUPRC. AUPRC also has a natural translation to implementation because it can be used to optimize positive predictive value and sensitivity when you're selecting thresholds. We'll show you how that works. So the no-skill model, the model that doesn't add any value is down here at the baseline event rate. The AUPRC is up here. If it's 10 times higher, then it offers 10 times as much predictive ability. And at the last step, individual binary thresholds to identify the criteria are one point on that curve. And so we use positive predictive value and sensitivity as the metrics for that step four. And these are better closer to the top right corner. Keep that in mind as Nelson talks you through the results. All right. So I'm going to just talk about some of the things we wrestled with during this work. Hopefully to begin the formulation of good questions in your minds for the roundtable and the Q&A afterwards. So first of all, why did we use existing organ dysfunction subcomponents in step one rather when starting from scratch? And so I think this was highly consistent with the overall pragmatic approach that Scott and Loren mentioned in the first set of talks. These scores have already been validated in children. And in many cases are already familiar to the community and in use in various settings in the community. And we wanted to develop criteria that people would use and could use effectively in a wide variety of environments. If they were already familiar with some of the pieces, then we thought that that was more likely. So this was really a pragmatic choice. So we broke apart that table at the bottom, found the best options, and put it back together. So why did we use stacked regression in step two? So this was a very natural methodologic approach given that we were using existing organ dysfunction subcomponents. Each subcomponent would get its own weight in the sepsis model. There are statistical guarantees that I promise I will not go into that the final model would be at least as accurate as the best individual subcomponents. And so stacked regression is often seen as having some of the benefits of deep learning where each unit has a weight that's optimized, but it's still highly interpretable. We know exactly what's in each of those subcomponents. We know exactly what the impact of that individual subcomponent is on the overall prediction. So we knew that it would be controversial that in the end the decision was made based on the results, not that renal and hepatic dysfunction are not present in the sepsis criteria. Does this mean that renal and hepatic dysfunction are not important at all? It does not mean that. Those are extremely important for the management and stratification and other purposes in the care of children with sepsis. What it means is that we were able to be efficient. And so perhaps to make it more likely that these criteria could be used in more austere environments because we found that we could get equivalent sepsis diagnosis only using the four organ systems. And so we knew that the eight organ system model based on that larger ridge machine learning model that Nelson mentioned would potentially be useful for some perhaps research uses and other things. And so we made it available in the supplement along with comprehensive analysis using the same metrics. And for reference, the eight organ system score, the Phoenix eight score, in addition to the four organ systems we've mentioned also includes endocrine, immunologic, renal and hepatic dysfunction. So getting back to that question of remote organ dysfunction, put another way, can a child with single organ respiratory or neurologic dysfunction have sepsis? And so I phrase it that way because those are the elements of the Phoenix sepsis score by which a child could achieve two points, the two points required for a sepsis diagnosis in addition to cardiovascular. I think everyone would agree that cardiovascular dysfunction is remote from the site of infection in almost all cases. But what about that single organ child with pneumonia or terrible viral infection of the respiratory system requiring them to require mechanical ventilation? So the answer is yes, they would qualify under the Phoenix sepsis criteria. However, the way that we looked at this with the task force is using these diagrams that were affectionately referred to as the eggs. And what we learned is that, in fact, nearly all kids who qualify with Phoenix sepsis have remote organ dysfunction. So that red in the middle of the egg are the children with Phoenix sepsis who have organ dysfunction that is remote from the site of infection. And the blue rim are the children with Phoenix sepsis who do not have organ dysfunction remote from the site of infection. It is true that the kids in the blue rim do have lower mortality both at high resource sites on the left side of the slide and low resource sites on the right side of the slide. And that's something that we present more completely in the supplement. So what if a health care facility doesn't routinely collect all the variables in the Phoenix sepsis score? And some of the coagulation testing like D-dimer is a good example. So according to the international survey, most of the variables, most of the elements in the Phoenix sepsis score are available in most settings around the world. However, the score is built with redundancy in mind, and this is part of the pragmatism. In order to get a sepsis diagnosis, it only takes two points. The median score for children with sepsis was three with an interquartile range of two to four. But the Phoenix sepsis score itself goes up into the teens. And so what that means is there are lots of ways to get two points. And we think that's why that even at a site that doesn't routinely clinically collect coagulation tests and lactate levels, the score functioned really well because those kids with sepsis were achieving two points in other ways. So in comparison with the adult sepsis three process and sort of the way that team worked, Similarities include that sepsis used the same conceptual framework. Sepsis is infection and organ dysfunction. We used large EHR-based data sets to derive and validate the new criteria. Differences include, as Nelson mentioned, the pediatric data set was larger, more diverse, more international, and included more higher and lower resource sites. We used AUPRC and positive predictive value and sensitivity as primary measures instead of AUROC. And we used organ dysfunction subcomponents instead of complete existing scores like SOFA. So limitations, electronic health record data, of course, has missingness and errors. We mitigated this as best as possible using a robust and reproducible harmonization and data quality process. I mentioned the GitHub repository at the beginning of my first talk. I'd encourage you to check that out if you're interested. This is in some sense also an advantage. This is real-world data. This is the data as it is represented in sites that may implement these criteria. And so these are the source data that those criteria may be computed on in the future. So it's a bit of pragmatism in that as well. Some organ dysfunction is, as measured by things like GCS, is iatrogenic. So GCS in intubated and sedated patients is an example there. That's a limitation we acknowledge. Also a bit of a real-world piece of pragmatism. Some lower-resource sites had important measures, as Nelson mentioned, that were not recorded even when they were performed for the patient. So in some cases, kids received mechanical ventilation, but it wasn't in the data that are in their clinical information system that we could access. We did not distinguish acute from chronic organ dysfunction, same decision that the Sepsis-3 folks had to make. And our data are from 2010 to 2019 from most sites. So we will need to look forward and reassess and revalidate the data in the post-COVID world. Next steps include the need for early identification and screening tools for possible sepsis, validation in hospital-acquired sepsis, as I mentioned earlier, and we are actively developing clinical decision support tools appropriate for use in high-resource environments, so tools that are run in your EHRs, as well as mobile tools for use in lower-resource environments. So we really want to provide an enormous thanks to SCCM, to our funders, the many collaborators in this work, the members of the task force, and I want to offer a special call-out to the core data science team for this work that I lead at the University of Colorado, Peter DeWitt, Seth Russell, and Meg Rebol. Thank you very much. Thank you.
Video Summary
The presentation focused on developing and validating criteria for pediatric sepsis using existing organ dysfunction subcomponents and machine learning techniques, funded by NICHD. The approach used stacked regression and the area under the precision recall curve (AUPRC) to construct a pragmatic model applicable in various settings, even with data imbalances. Existing scores were used to ensure familiarity and validation, facilitating wider acceptance and utility, especially in resource-limited areas. Challenges like real-world data limitations and organ dysfunction iatrogenicity were acknowledged. The session concluded with plans for enhancing early identification, validating criteria for hospital-acquired sepsis, and developing clinical support tools.
Asset Caption
Two-Hour Concurrent Session | Announcement of the Novel Phoenix Pediatric Sepsis Criteria
Meta Tag
Content Type
Presentation
Membership Level
Professional
Membership Level
Select
Year
2024
Keywords
pediatric sepsis
machine learning
organ dysfunction
AUPRC
clinical support tools
Society of Critical Care Medicine
500 Midway Drive
Mount Prospect,
IL 60056 USA
Phone: +1 847 827-6888
Fax: +1 847 439-7226
Email:
support@sccm.org
Contact Us
About SCCM
Newsroom
Advertising & Sponsorship
DONATE
MySCCM
LearnICU
Patients & Families
Surviving Sepsis Campaign
Critical Care Societies Collaborative
GET OUR NEWSLETTER
© Society of Critical Care Medicine. All rights reserved. |
Privacy Statement
|
Terms & Conditions
The Society of Critical Care Medicine, SCCM, and Critical Care Congress are registered trademarks of the Society of Critical Care Medicine.
×
Please select your language
1
English