How to calculate patient-specific estimates of benefit and harm from a RCT

One of the more challenging concepts for students is how to apply information from a study to an individual patient. Students have been taught how to calculate a number needed to treat (NNT) but that isn’t often very useful for the current patient they are seeing. Usually our patients are sicker or healthier than those in the study we are reading. Studies include a range of patients so the effect we see in the results is the average effect for all patients in the study.

Imagine you are seeing Mr. Fick, a 70 yo M with ischemic cardiomyopathy (EF 20%) and refractory anemia (baseline Hg 7-10 mg/dl). He reports stable CHF symptoms of dyspnea walking around the house after about 30 ft. He reports other signs and symptoms of CHF are stable. Medications include lisinopril 20mg bid, aspirin daily, furosemide 80 mg daily, and iron tablets daily. He is not taking a beta blocker due to bradycardia and can’t take a statin due to myopathy. He has refused an ICD in the past. BP is 95/62 mm Hg, pulse is 50 bpm, weight is stable at 200 lbs. Labs done one week earlier show a stable Na 0f 125 mmol/l, K 3.8 mmol/l, Hg 8 g/dl, platelets 162 k, WBC is normal with 22% lymphs on differential, cholesterol is 220 mg/dl, and uric acid is 6.2.  Since he has severe CHF you are considering adding spironolactone to his regimen. he is concerned because he has a hard time tolerating medications. He wants to know how much it will help him. What do you tell him?

This figure is from the RALES trial, a study of spironolactone in patients with advanced CHF. Use the figure below to figure out Mr. Fick’s individual estimated risk of death if he agrees to take spironolactone.

RALES figure

There are 4 methods I will demonstrate to calculate a patient-specific estimate of effect from an RCT. First, think about what information you will need to estimate Mr. Fick’s specific benefits of spironolactone. You will need the NNT from the RALES trial and Mr. Fick’s estimated risk of death (we call this the PEER or the patient expected event rate). Where do we get the PEER of death for Mr. Fick? You use a validated prediction rule. I use Calculate by QxMD. Look in the Cardiology folder under heart failure and open the Seattle Heart Failure Model. Plug in Mr. Fick’s data and you get his 1 year expected risk of death (56%).

Method 1: Calculate patient-specific NNT using PEER: the formula for this is 1 / (PEER x RRR) where RRR is the relative risk reduction from the RALES trial (30%. To calculate that: 1-RR is the RRR). So plugging that in, Mr. Fick’s NNT is 1 / (0.56 x 0.3) = 6 (the NNT from the RALES trial is 9).

Method 2: Estimate patient-specific NNT using f: F is what I call the fudge factor. It is your guesstimation of how much higher or lower Mr. Fick’s risk of death is than that of the average patient in the study. If you say he is 2 times more likely to die then f is 2. If you think he is half as likely then f is 0.5. The way to use f is to divide the study NNT by f. This gives an estimate of Mr. Fick’s NNT. So lets just say Mr. Fick is twice as likely to die than those in the study. The NNT of the study is 9.  So 9/2 is 4.5 which I would round up to 5.

NNTs are nice but its hard to use them directly with a patient. The next 2 calculations are more useful for patients.

Method 3: use the RR  to calculate Mr. Fick’s actual risk of death: the RR of death in the RALES trial is 0.70. You multiply this by his estimated death rate and you get his expected death risk if he were on spironolactone instead of nothing. His risk of death is 56%. So 0.70 x 0.56 = 39%. So if Mr. Fick takes spironolactone I expect his risk of death to go from 56% down to 39%. That’s useful information to tell the patient.

Method 4: use the RRR to calculate Mr. Fick’s actual risk of death: This is similar to the concept above except that you have to remember that the RRR (relative risk reduction) is relative. So first you calculate how much risk is reduced by the treatment. The RRR is 30% (1-RR is RRR). Then I multiply this by the patient’s risk of death. 0.30 x 0.56 is 0.168. This 16.8% represents how much risk I have removed from the baseline risk. Now I have to subtract it from the baseline risk and I get his final risk. So 0.56-0.168=0.39 or 39%. Same number as method 3 and it has to give the same number because its just a different way of calculating the exact same thing.

I hope this is useful and now you can give patients some real numbers instead of just saying your risk is decreased by x%.

Remember you need: patients risk of the event without treatment (usually from a prediction rule or maybe the placebo event rate of the study or placebo rate of a subgroup) and event rates from the study. Then you can make all the calculations from there.

The devil is in the details- overstating the results of the effects of corticosteroids in patients with pneumonia

This blog post will tie in nicely with what I blogged on earlier today about composite endpoints. Read that post first before reading this.

Today I received my e-table of contents from JAMA and read a study on the of Effect of Corticosteroids on Treatment Failure Among Hospitalized Patients With Severe Community-Acquired Pneumonia and High Inflammatory Response. The primary outcome of the study was “treatment failure (composite outcome of early treatment failure defined as [1] clinical deterioration indicated by development of shock, [2] need for invasive mechanical ventilation not present at baseline, or [3] death within 72 hours of treatment; or composite outcome of late treatment failure defined as [1] radiographic progression, [2] persistence of severe respiratory failure, [3] development of shock, [4] need for invasive mechanical ventilation not present at baseline, or [5] death between 72 hours and 120 hours after treatment initiation; or both early and late treatment failure).”

The authors make a bold statement:

The results demonstrated that the acute administration of methylprednisolone was associated with less treatment failure…”

I find this statement (from the 1st sentence in the discussion section) to be a vast overstatement of what they in fact found in this study.  Examine the table below (I trimmed out the per-protocol analysis results) and see just what was actually reduced by steroids.

From JAMA 2015;313(7):677-686

From JAMA 2015;313(7):677-686

Steroids had no effect on “early treatment failure”. They significantly reduced “late treatment failure” but this was all driven by one outcome. The only thing steroids did was reduce radiographic progression. They didn’t help any other outcomes of this large composite but yet the authors make this sweeping statement of steroids being associated with less treatment failure. This demonstrates the importance of looking at the individual components of the composite and not just focusing on the overall composite result.

It also demonstrates why I don’t like to read the discussion section of a paper nor the conclusions from an abstract- you will be misled. The reviewers and editors should have toned down these conclusions as they are a gross overstatement of what was actually found.

How to decide when a composite endpoint should go into the compost

Composite endpoints are commonly used in studies. A composite endpoint is an endpoint composed of several other endpoints. If a patient experiences any one of them they are considered to have experienced the endpoint of the trial. For example, a composite endpoint in a typical cardiovascular study includes nonfatal MI, nonfatal stroke and cardiovascular death. A patient doesn’t have to have all three just one of them.

Why use composite endpoints? The main reason is to reduce the number of patients needed in the study. The chance of a patient having any one outcome is much less than having any one of three outcomes. They are also used to potentially reduce the length of follow-up needed in a study. A patient is likely to develop one of three outcomes more quickly than any one outcome or one of the components of the composite can occur sooner than another (e.g. doubling of serum creatinine vs. initiation of hemodialysis).

Not all composites are created equal. Some are good and many are poorly developed. Examine the composite outcome below from the RENAAL trial published in the NEJM in 2001. The primary efficacy measure was the time to the first event of the composite end point of a doubling of the serum creatinine concentration, end-stage renal disease, or death. What do you think? Is this a good composite or a poor composite? (Note: I put a red mark next to the components of the composite)

From the RENAAL trial, NEJM 2001

From the RENAAL trial, NEJM 2001

I think this is a poorly designed composite. Why do I say that? A good composite should have the following characteristics:

  1. Each component should be valued equally by patients,
  2. Each component should occur with similar frequency, and
  3. The intervention should have the same relative effect on each component.

With this in mind, reevaluate the RENAAL composite endpoint. Hopefully you agree with me that its not a good composite endpoint. Let’s examine it more closely.

Issue #1: would patients consider each of the components to be of equal value? Patients would not consider death and doubling of serum creatinine as being equal. Clearly they would value death as a much worse outcome. So this composite fails here.

Issue #2: do each of the components of the composite occur with equal frequency? Looking at the percentages of the components in the losartan group they are pretty close to each other (21.6%, 19.6%, and 21%) so I would give the composite a pass on this criterion.

Issue #3: does the intervention (losartan) have an equal effect on each of the components of the composite? Look under the risk reduction column and the answer is no. Doubling of serum creatinine is reduced by 25% and end stage renal disease by 28% but death is actually increased by 2%. Thus, the composite fails on this criterion.

What should you do if the composite endpoint is a bad composite? Just ignore the composite and look at the individual components. Even if a composite is a good one you should always examine the individual components. So in this case losartan reduces the risk of ESRD and doubling of serum creatinine but has no effect on mortality.

Overcoming Probability Inflation

Benjamin Roman, MD, MSPH wrote a wonderful piece in this week’s New England Journal of Medicine. It might not get read much because it is listed way down the table of contents but I think it is more clinically important than any other piece in the journal this week. He tells of his own story of having sudden sensorineural hearing loss and agreeing to an MRI even though the probability of him having a serious cause of the problem was low, the cost of the test (MRI) was high, and the benefit of treatment was minimal (in fact, many don’t need treatment). Furthermore, he is an ENT physician and knows all this but still underwent testing anyway- mainly because his wife wanted him to!

He outlines an important problem in medicine for both physicians and patients: probability inflation.

This problem arises from the way we deal emotionally (added for emphasis) with risk and uncertainty, which are givens in health care, and the way we make decisions in the face of low-probability outcomes.

Emotions are a large part of the problem; the affect heuristic. When we make decisions we often consider it analytically but also from the standpoint of how we feel about it. If we have positive feelings about the situation we magnify the probability of benefit or, conversely, reduce the magnitude of harm. Think about Dr. Roman’s situation. He (or at least his wife was) was worried about something bad happening (ie having an acoustic neuroma) but understood that was pretty unlikely to be the case. But what if he didn’t do the MRI and he actually had a treatable one that would be missed. He had strong feelings (or at least his wife did) that he didn’t want to miss the acoustic neuroma. Or maybe he would be relieved that he didn’t find one (that’s is a strong positive emotion isn’t it) if the MRI was negative (assuming the sensitivity is good enough). Thus, the acoustic neuroma’s probability becomes artificially inflated. He probably didn’t even think about the downstream effects of finding one and the risks associated with having surgery or radiation (which probably outweigh the benefits of finding it if I had to guess).

Many of us fear the uncertainty almost more than the disease itself. We want to know even if we can’t act on the information we are given. We also like doing something. At least we will go down fighting. This affects both physicians and patients. We order things we shouldn’t. Patients request things they shouldn’t. Sometimes its because of poor reasoning skills. The affect heuristic gets us. Sometimes its more practical as Dr. Roman notes:

My doctor’s recommendation was based on a similar reaction. Besides wanting to reassure himself and his patients that there is no acoustic neuroma, he told me, another reason he suggests MRIs in situations like mine is that he fears being sued should he fail to order one and end up missing something. He noted that court malpractice awards for missed acoustic neuromas commonly reach into the millions of dollars and that until we agree to an acceptable miss rate and physicians are no longer liable for missing just a single such case, their practices will not change. I’m not sure how common such verdicts are, but this rationale also reflects risk aversion in the face of a low-probability bad event — it’s simply the doctor’s risk that’s at issue, rather than the patient’s. (emphasis added)”

That last statement is telling. It’s a shame so much of medicine revolves around covering our proverbial asses.

Dr. Roman offers some solutions:

  1. comparative effectiveness and outcomes research (this exists for many things but gets ignored)
  2. educating doctors about how to discuss uncertainty, risk, and probability (First, doctors need to be taught these principles before they can teach anyone else. I see first hand on a daily basis how little of this is understood)
  3. addressing emotions and psychology of patients and physicians (good luck dealing with emotions….. anyone have a teenage child?)
  4. nudging each other to do the right thing
    • consumers share cost of things they want that are marginal (good idea for sure)
    • government (either local or national) regulation (Hell no! More bureaucracy is not needed and will only raise costs even more)

As Dr. Roman points out all of these need to be done but the devil is in the details. HOW? I think the focus of these solutions is from a society or community perspective and physicians mainly feel a duty to only one individual- the individual sitting in front of them. That relationship is powerful and affects decision making.

My dad had advanced dementia and fell in his bathroom suffering a tibial plateau fracture. The surgeon wanted to fix it surgically as this would give my dad the best chance to walk (though he couldn’t actually tell me the probability). The only other option was splinting and rehab.  Thankfully, I know enough about dementia and specifically my dad’s dementia to know he would never be able to participate in rehab and I knew he would never be able to keep the wound clean and stay off his leg until it healed. I decided not to do the surgery and opted for rehab and splinting. My dad never walked again. He couldn’t understand how to do rehab or to use a walker. I made the right decision because I think the ultimate outcome would have been the same either way- not walking. I have no way of knowing. It was a decision under uncertainty. I saved his insurance and Medicare a lot of money. That wasn’t my goal. My goal was to maximize outcomes in the most resource-sensitive way that would harm my dad the least. I felt surgery would be more harmful than not doing the surgery. Should the surgeon have even offered to do surgery? Should he have just said that splinting was the best for someone like my dad with advanced dementia? When he offered surgery did he really thing it would help or was it because he was a surgeon and that’s what they do?

Like all complex problems the solutions are equally if not more complex. I will continue to do my small part of educating who I can on EBM principles and hopefully a few of my learners will make good decisions.

 

What is EBM?

With all the discussion of EBM in crisis and EBM on trial it strikes me that maybe these other folks have a different definition or concept of EBM than I do. I think to have any discussion needs to come from a common ground of just what is EBM.

Evidence based medicine is the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients.

This is the original definition of EBM published in 1996. It urged us to strive to use best available evidence in making clinical decisions. It also cautioned not to be a slave to evidence as evidence was often not applicable to individual patients. This definition served us well until the patient-centered paradigm of care became popular and the definition of EBM evolved to its current form:

This definition is more explicit about the order of importance of the individual elements of the components of EBM: patient preferences and actions is foremost, followed by the clinical state and circumstances and the research evidence. All this is tempered or tied together by our clinical expertise. The evidence tells us what could be done while the rest tells us what should be done.

The other way to look at EBM is that it is just a set of skills:

  1. asking an answerable clinical question
  2. finding the best available evidence
  3. critically appraising the evidence
  4. applying the evidence to individual patients
  5. appraising how well you did on each step and, I think, appraising the impact on a patient

So from this background I find it difficult to lay blame on EBM for many of the problems with the evidence. I blogged on this previously and will refute their claims at EvidenceLive2015 in April.

Will EBM be found guilty or not guilty?

Carl Heneghan recently wrote a blog for BMJ blogs entitled Evidence based medicine on trial focusing mostly on the problem with the evidence part of EBM. While I mostly concur with his list  of the problems (distortion of the research agenda, very poor quality research, and lack of transparency for published evidence) I wonder who is at fault. “EBM” seems to get the blame as if there is an entity that is EBM and it controls all research. EBM is but a set of skills: question asking, searching, critical appraisal, and application to individual patients. It is nothing more. So why are people being so critical and place so much blame on a set of skills? There will be several sessions at EvidenceLive 2015 (one of which I will be speaking at in defense of EBM) on real vs rubbish EBM.

I want to focus on the distortion of the research agenda. Professor Heneghan rightly points out that the research agenda is driven by industry. Is that good or bad? I think its both but mostly good. The only other major funders of research  are governmental agencies like the NIH. Profit drives innovation. It is very expensive to bring a drug to market. The government could not afford to bring the current drugs we have and need to market. One failed drug alone would deplete the coffers. Failure is the biggest driver of cost. Fewer than 1 in 10 drugs tested makes it to market. Would we tolerate that poor of a success rate at such a big cost by the government? No.

…adjusting that estimate for current failure rates results in an estimate of $4 billion in research dollars spent for every drug that is approved.

I agree that industry seems at times to make a drug then find a “disease” for it. I think the example Professor Heneghan gives is spot on. I don’t believe in adult ADHD but we have drugs for it. Do we need them? No and this video demonstrates why:  Drug free treatment of ADHD. Who is really at fault are the doctors who prescribe the drugs that Professor Heneghan feels aren’t necessary. Not the companies for making them.

On a serious note…what about all the devices we use regularly like stents, defibrillators, etc? Would government have independently brought these to market? Likely not. We had balloon angioplasty (without stenting) that worked just fine albeit short term only. It would have been “good enough for government work” as the saying goes. What about advancements in imaging modalities? Again likely not. The old CT scanners worked just fine. Industry is largely responsible for innovation and improvement in all walks of life. Yes for a profit but profit is not a bad thing. Those who say otherwise please return your iPhones.

2014 in review

The WordPress.com stats helper monkeys prepared a 2014 annual report for this blog.

Here’s an excerpt:

The concert hall at the Sydney Opera House holds 2,700 people. This blog was viewed about 8,400 times in 2014. If it were a concert at Sydney Opera House, it would take about 3 sold-out performances for that many people to see it.

Click here to see the complete report.