Sepsis alerts work! Just not in the patients who fire the alerts
SCREEN trial tests mass-scale deployment of EMR sepsis alerts in Saudi Arabia
In the past decade, so-called “sepsis alerts” came out of nowhere to become a ubiquitous and resource-intensive component of inpatient medical care.
The trouble is, no one checked first to see if they work.
“Sepsis alerts” are automated notifications that flag patients who meet certain criteria compatible with severe infection discernible from the electronic medical record. The models vary but are often tuned to fire for some combination of organ dysfunction, abnormal vital signs, and lab results.
Epic Systems launched its sepsis model in the mid-2010s, and many other competing products have come to market. Their spread has coincided with the Centers for Medicare and Medicaid’s “sepsis bundle quality measure” (SEP-1) going live in 2015. With financial penalties coming for noncompliance with the new rule, hospital administrators rapidly deployed the alerts to fire on all eligible inpatients.
And have they fired. But for whom, and to what end? A 2021 analysis of Epic’s sepsis model revealed atrocious false positive and negative rates. (Epic disputed the findings.) A follow-up study in 2023 showed the Epic model performed badly in complex patients. A 2024 paper concluded that Epic’s model “cheats” by relying on clinician judgments in the EMR to generate sepsis alerts, yet still performed poorly. When not relying on clinician judgment, they concluded Epic’s model mischaracterized sepsis cases, i.e., it was worse than a coin toss. (Epic has disputed the findings.)
It’s true that many, many observational studies have been published claiming that after hospital systems launched systemwide EMR alerts for suspected sepsis, compliance rates with targeted therapies (antibiotics and IV fluids, primarily) increased. Mortality from “sepsis” is frequently reported to have fallen dramatically—but since there is no validated way to diagnose sepsis, and the models have such high false positive rates, many patients without sepsis are counted in the (misleading) post-implementation mortality figures. Comparing mortality rates before and after the implementation of such error-prone case-finding tools is comparing apples with oranges.
That’s not to say that sepsis mortality hasn’t decreased, or that quality improvement efforts have been unproductive, just that the data has been too messy to say with any clarity.
A cluster randomized trial using all-cause mortality as an outcome was always needed to determine whether EMR alerts improve outcomes for suspected severe infection. But absent a change to the regulatory regime, there was no foreseeable way U.S. health systems would conduct a trial randomizing patients not to receive EMR alerts.
In Saudi Arabia, though, there’s no CMS, no SEP-1, and (if it pleases the King) no barrier to testing EMR alerts in patients with organ dysfunction.
The SCREEN Trial
SCREEN was a stepped-wedge cluster randomized trial of an EMR alert for organ failure, conducted at 45 wards of 5 hospitals in Saudi Arabia from 2019 to 2021. The alerts ran in “silent mode” in all wards for about two months. In a randomized, stepwise fashion over ~16 months, clusters of about five wards began seeing the pop-up alerts every two months. The intervention gradually rolled out until alerts were visible in all wards at the end of the trial. The alerts remained invisible in the wards who had not yet gone live.
There were 29,442 patients admitted to wards when alerts were active (dubbed the “screening group”) and 30,613 when they were inactive (the “no screening group”). These were general wards and patients could have any diagnosis (not just infections).
The first thing to note is that these were not “sepsis alerts” (whatever that term might signify to you). They should have been called “organ failure alerts.” An alert fired to notify the nurse, charge nurse, and on-call physician if a patient had 2 or more qSOFA criteria (systolic blood pressure ≤100 mm Hg, respiratory rate ≥22 breaths per minute, or Glasgow Coma Scale score <15) within 12 hours. No EMR-derived suggestion of infection needed to be present for the alert to fire.
As you would predict, the organ failure alerts fired a lot (in about 1 in 6 patients), and most (2 out of 3) were deemed to be false positives (i.e., not sepsis).
The alerts did create exciting flurries of activity! Patients receiving alerts often received new orders for intravenous fluids and lactate collection. Most (81%) were already on antibiotics or had been recently.
Yet at 90 days, there was no difference in crude mortality between the screening (3.2%) and no screening groups (3.1%).
But!
When adjusting for time period, clustering within wards, hospitals, and Covid-19 status in the primary analysis (this was during the Covid pandemic), the screening group had lower 90-day in-hospital mortality compared with the no screening group (aRR, 0.85; 95% CI, 0.77-0.93; P < .001). Adjustment for time period is essential in stepped-wedge trials that gradually roll out an intervention, to minimize confounding by changes in care that occur with time, unrelated to the intervention itself.
So there you go: sepsis alerts work. Right?
Sepsis Alerts Work—On the Other Patients
Remember that in the SCREEN design, all the patients with new organ dysfunction fired alerts, but when admitted to the wards that had not yet gone live (considered together as the no-screening group), the alerts were hidden from clinicians’ view.
Here comes the important part.
When comparing patients who fired a (visible) organ failure alert in the “sepsis screening” group (which produced lots of new caregiver actions) with patients who fired a (hidden) alert in a ward that hadn’t gone live yet (generating fewer new actions)—there was no difference in mortality (RR 1.04).
This was despite a higher intensity of care in the visible-alert patients, who had more rapid responses, ICU transfers, code blues, intubations, and initiation of dialysis. None of it made a difference to all-cause mortality.
In other words, “sepsis screening” saved lives, but not the patients’ who fired the alerts.
This suggests that the mortality reduction among all patients assigned to the screening group was due to differences in care between the groups generally, not due to screening per se.
This makes sense: the organ failure alerts were firing often and indiscriminately, 66% of the time in patients deemed not to have sepsis. This could produce lifesaving attention and care in the screening group as a whole.
There were likely also unmeasured general institution-level effects associated with the new program. Extensive training occurred before the trial launched; healthcare workers knew their actions would be tracked and scrutinized. Changing their behavior was the point, of course—but complex unexpected effects inevitably occur in projects on this scale.
For example: more (visible) alerts fired in the screening group, in total, than (invisible ones) in the no-screening group. This should not have occurred with proper randomization and isolation of the intervention. This led the authors to speculate that qSOFA parameters were being documented more diligently on the wards randomized to screening.
Saudi Arabia is considered an authoritarian society, and internationally significant research projects have important political value to the monarchy. These factors may have influenced healthcare worker behavior in ways that are hard for outsiders to understand.
Conclusions
As the only cluster randomized trial to have tested the question, the SCREEN trial provides the strongest available evidence that so-called “sepsis alerts” do not directly improve care for the patients they flag. Patients who fired organ failure alerts in the screening group received a higher intensity of care on average, yet had no improvement in mortality over control patients firing hidden alerts in the no-screening group, who received less intensive care.
The screening group overall (firing alerts or not) had significantly improved mortality compared to the no screening group after (appropriate) adjustment for multiple factors, although gross mortality was identical in both groups. The adjusted mortality reductions in the screening group seem likely to be due to better care generally; patients firing alerts experienced no mortality reduction attributable to the alerts.
These findings raise the higher-order question of whether systemwide campaigns to improve sepsis care can save lives despite the ineffectiveness of sepsis alerts in individual patients. The SCREEN trial strongly supports the intuitive idea that this is the case. That doesn’t make “sepsis alerts” a good idea, but it places them into a larger context.
Policy Implications
With the convincing signal that organ failure alerts do not directly improve care for individual patients, but that systemwide efforts improve collective outcomes, the Saudi investigators have provided the gift of a ladder for U.S. policymakers to climb out of the hole they’ve dug by creating the burdensome and ineffective SEP-1 “quality” measure.
As we approach the tenth anniversary of SEP-1, CMS has an opportunity to start again by convening a new committee, free from influence by any supposed expert in long-refuted theories of “goal-directed therapy”, financial conflicts of interest and allegedly problematic research conduct that marred the development of the original measure.
Rather, CMS should seek counsel from leaders in systems-based processes of care, teamed with frontline emergency physicians and intensivists from academic centers and the community, who could call for briefing and expertise from other sources as appropriate.
The new iteration of the performance measure should abandon its misguided focus on the scorekeeping of the behavior of individual clinicians treating individual patients. It would acknowledge the potentially fatal problem of accurate case identification (i.e., there is no validated method to diagnose sepsis; i.e., that unlike in diabetes, strokes, and bypass surgery, there is no way to reliably quantify the denominator in the SEP-1 quality metric).
Instead of clinical micromanagement, in its new form it would incentivize continuous improvement in systemwide operational processes. Enhancing teamwork, communication, and a culture of timely, wise, and safe action toward deteriorating patients while minimizing busywork and ineffective clinician activity would be some of the guiding principles. Administratively, CMS could convert SEP-1 to a “process”-based or “intermediate outcome measure” or another type.
Case identification, outcomes tracking and quality improvement for sepsis are too important to allow the failing status quo to stand. If hospitals and policymakers build systems that support rather than hamper frontline clinicians, then get out of their way, outcomes for all patients will improve—including those with sepsis, by whatever definition you choose.
Reference
Evaluation of Sepsis Prediction Models before Onset of Treatment, NEJM AI February 2024.
CMS, Quality Measures: How They Are Developed, Used, & Maintained. Accessed January 5, 2025.
Love the analysis and the policy proscription in the piece. The whole SEP-1 bundle has been a mess.