High-flow nasal oxygen vs. noninvasive ventilation for acute hypoxemic respiratory failure (RENOVATE)
A noninferiority Bayesian trial with five subgroups, oh my. Grab coffee.
Severe acute hypoxemic respiratory failure (AHRF) is today treated with high-flow nasal cannula oxygen or noninvasive ventilation, in an attempt to forestall the need for intubation and mechanical ventilation.
It’s uncertain which method of respiratory support is superior, or in which situations.
AHRF can result from many different illnesses: viral or bacterial pneumonia, cardiogenic or noncardiogenic pulmonary edema, exacerbations of chronic lung disease, and more.
This has limited the confidence in extrapolating findings from trials enrolling patients with one illness to other patient populations. For trials enrolling patients with AHRF of assorted causes, each subgroup has generally been too small to make firm conclusions as to the relative benefits of either modality in that group.
The RENOVATE trial sought to settle these questions by enrolling a large number of patients in an adaptive Bayesian trial design — an innovation that purports to deliver more useful information, more efficiently than traditional randomized controlled trials.
To find out whether they succeeded, we first have to talk statistics. Hey, where are you going?—it’s going to be fun!
Traditional Randomized Trials: The Price of Alpha
In a conventional randomized trial, investigators calculate an enrollment target (sample size) based on the predicted baseline event rate (incidence) in the controls, the predicted effect size of the intervention, and the desired certainty parameters (usually an ⍺ of 0.05 implying a 5% rate of false positive or type I error, and a power of 80% [=1 minus β] to reduce the rate of false negative or type II errors).
All of these factors impact the sample size—generally speaking, it takes more patients to find smaller relative differences, especially when baseline event rates are low.
For example (from ClinCalc):
If disease X’s mortality is 90%, testing a new drug that reduces death by a relative 10% (a 9% absolute difference, achieving an 81% mortality rate) would require enrolling 478 patients.
In the neighboring country where X’s mortality is 60%, finding an absolute 9% improvement (to 51%) would require 954 patients. Establishing a relative 10% reduction (from 60% to 54%) requires 2,136 patients.
At the top center for disease X where mortality is 20%, finding a 10% relative reduction (to 18%) requires 12,078 subjects.
You can see how this can get expensive, in time and money.
Competing interventions in critical care like HFNO and NIV are in presumed clinical equipoise. That’s why they’re used interchangeably, and why testing them against each other is considered ethical. This also implies a small expected difference (if any) between the therapies.
That in turn implies very large sample sizes would be required to establish the putative small difference, and even larger sizes would be needed to confidently argue against any false negative type II error.
This would be extremely expensive, taking many years, with no results until trial completion. The risks are high for an inconclusive result, because it’s common for investigators to follow the incentive to overestimate their event rates or effect sizes (reducing enrollment targets, and the budget, making a green light from funders more likely).
All of these locked-in parameters have traditionally been considered vital features ensuring the integrity of randomized trials.
Is there another way to squeeze more information out of fewer patients faster, without losing confidence in the results?
What Are “Adaptive Bayesian Trials”?
In “adaptive” trials, the trial processes are established flexibly and modified as data is coming in, using advanced statistical analysis based on Bayes’ theorem.
You may know Bayes’ theorem— P(A|B) = [ P(A) * P(B|A) ] / P(B) —better than you think. Recall that in a very low prevalence situation (i.e., low prior probability, such as a rare disease existing in 1 per 100,000), even a seemingly accurate test (e.g., 99% specific with a 1% false-positive rate) will produce far more false positives than true positives (i.e., its positive predictive value—the likelihood a positive test is a true positive, a.k.a. its posterior probability—is low). As the prevalence (prior probability) increases, the false positive rate decreases, and the test’s positive predictive value (its posterior predictive value) increases.
Examples for the results of a test with 90% sensitivity and specificity:
At 1% prevalence (0.01 prior probability), a positive test is only 8% likely (0.08 posterior probability) to be “truth” (a true positive).
At 30% prevalence (0.30 prior), a positive test is 79% likely (0.79 posterior) to be “truth”.
Play with this at epitools for a refresher.
In a Bayesian adaptive trial, the designers input their estimate of the prior probability of a result (e.g., an odds ratio <1.55 for intubation or death for HFNO compared to NIV in patients with AHRF). Then a posterior probability is chosen that will be accepted as certainty (e.g., 0.992).
Unlike the hypothetical example in which the true positive disease rate is known, “true” prior probabilities are unknown in most Bayesian adaptive trials. Like the predicted event rates in conventional randomized trials, the trial’s “priors” represent the investigators’ best guesses, based on existing data and expert opinion.
As patients with AHRF subgroups are randomized to NIV or HFNO and are intubated or die (or not), the posterior probability for noninferiority in that subgroup is continuously calculated and updated using Bayes’ theorem, informed by the priors. When the posterior probability is met at an interim analysis, that arm is considered complete.
Patients don’t need to be randomized in fixed allocations, and no fixed sample size is required—the posterior probabilities guide enrollment and allow for “adaptive” decision-making mid-trial. (For example, Covid-19 patients were added as a separate subgroup after trial commencement.)
There doesn’t even need to be a fixed start and end date of the trial; rather, “platforms” have been established that can continuously accept new patients and test new questions (e.g., REMAP-CAP was designed for community-acquired pneumonia, but adapted for Covid-19).
This produces information continuously from day one of the trial.
Ultimately, though—just like the positive/posterior predictive value of the hypothetical test above—the results of a Bayesian adaptive trial are highly influenced by the prior probabilities chosen.
Unfamiliar maneuvers are performed: if adaptive trial subgroups have low enrollment, patients can be conveniently “borrowed” from another subgroup to add statistical heft to the numerically light subgroup as the trial goes along, and “count” toward the deficient subgroup’s statistical calculations—as we will see was done in RENOVATE.
Adaptive Bayesian trials require specialized statistical expertise that today is often contracted out to Berry Consultants—a privately held, family-owned firm in Austin, Texas employing more than 20 statistics Ph.Ds, according to its website—or one of its smaller competitors.
The RENOVATE Trial
RENOVATE was a noninferiority trial conducted among 1800 patients at 33 hospitals in Brazil between 2019 and 2023 (including during the Covid pandemic). Patients with acute hypoxemic respiratory failure were analyzed in 5 subgroups:
COVID-19 (in ~890, about half the patients)
Nonimmunocompromised (n~485),
Immunocompromised (only 50 patients),
Chronic obstructive pulmonary disease exacerbation with respiratory acidosis (77 patients),
Acute cardiogenic pulmonary edema (272 patients).
The adaptive trial design dynamically randomized them to either noninvasive ventilation or high-flow nasal cannula oxygen. They were also analyzed in variable “clusters”—basically, statistical fishing trips to try to suss out differences or similarities in treatment responses across different patient subgroups (and to facilitate “dynamic borrowing”—see below).
They were followed for the composite outcome of death or intubation within 7 days; 1766 completed the study (>98%).
There was obvious statistical noninferiority of HFNO to NIV overall: intubation or death at 7 days occurred in 39% in the HFNO group vs 38% in the NIV group.
The overall population was not actually tested statistically; only the subgroups were. They used Bayesian adaptive methods to try to ascertain noninferiority of HFNO to NIV for the five subgroups of patients.
That’s where things got weird.
Keep reading with a 7-day free trial
Subscribe to PulmCCM to keep reading this post and get 7 days of free access to the full post archives.