I am preparing for a talk on the controversy surrounding JNC-8 and came across a post on KevinMD.com by an author of a Cochrane systematic review that aimed to quantify the effects of antihypertensive drug therapy on mortality and morbidity in adults with mild hypertension (systolic blood pressure (BP) 140-159 mmHg and/or diastolic BP 90-99 mmHg) and without cardiovascular disease. This is an important endeavor because the majority of people we consider treating for mild hypertension have no underlying cardiovascular disease.
David Cundiff, MD in his KevinMD.com post made this statement:
The JNC-8 authors simply ignored a systematic review that I co-authored in the Cochrane Database of Systematic Reviews that found no evidence supporting drug treatment for patients of any age with mild hypertension (SBP: 140-159 and/or DBP 90-99) and no previous cardiovascular disease, diabetes, or renal disease (i.e., low risk).
Let’s see if you agree with his assessment of the findings of his systematic review.
As is typical for a Cochrane review the methods are impeccable so we don’t need to critically appraise the review and can review the results. The following images are figures from the review. Examine them and then I will discuss my take on the results.
Coronary Heart Disease results
If you just look at the summary point estimates (black diamonds) you would conclude the treatment of mild hypertension in adults without cardiovascular disease has no effect on mortality, stroke and coronary heart disease but greatly increases withdrawal from the study due to adverse effects. But you are a smarter audience than this. The real crux is in the studies listed and examination of the confidence intervals.
Lets examine stroke closely. 3 studies were included that examined the treatment of mild hypertension on stroke outcomes. Two of the studies had no stroke outcomes at all. The majority of the data came from one study. The point estimate of effect was in fact a reduction of stroke by 49% but the confidence interval included 1.0 so not statistically significant. But the confidence interval ranged from 0.24-1.08- a potential 76% reduction in stroke up to an 8% increase. I would argue that a clinically important effect (stroke reduction) is very possible and had the studies been higher powered we would have seen a statistically significant reduction also. I think to suggest no effect on stroke is misleading. The same can be said for mortality.
Finally, what about withdrawals due to adverse effects. Only 1 study provided any data. It has an impressive risk ratio of 4.80 (almost 5 fold increased risk of stopping the drugs due to adverse effects). But the absolute risk increase is only 9% (NNH 11). We are not told what these adverse effects are to know if they were clinically worrisome or just nuisances for patients.
So, I don’t agree with Dr. Cundiff’s assessment that there is no evidence supporting treatment. I think the evidence is weak but there is no strong evidence to say we shouldn’t treat mild hypertension. The confidence intervals include clinically important benefits to patients. More studies are needed but will not be forthcoming. Observational data supports treating this group of patients and may have to be relied upon in making clinical recommendations.
Dr. La Rochelle published an article in BMJ EBM this month with a very useful figure in it (see below). It is useful because it can help our learners (and ourselves) remember the relationship between the type of evidence and its believability/trustworthiness.
Lets work through this figure. The upright triangle should be familiar to EBM aficionados as it is the typical hierarchy triangle of study designs, with lower quality evidence at the bottom and highest quality at the top (assuming, of course, that the studies were conducted properly). The “Risk of Bias” arrow next to this upright triangle reflects the quality statement I just made. Case reports and case series, because they have no comparator group and aren’t systematically selected are at very high risk of bias. A large RCT or systematic review of RCTs is at the lowest risk of bias.
The inverted triangle on the left reflects possible study effects, with the width of the corresponding area of the triangle (as well as the “Frequency of Potential Clinically relevant observable effect arrow) representing the prevalence of that effect. Thus, very dramatic, treatment altering effects are rare (bottom of triangle, very narrow). Conversely, small effects are fairly common (top of triangle, widest part).
One way to use this diagram in teaching is to consider the study design you would choose (or look for) based on the anticipated magnitude of effect. Thus, if you are trying to detect a small effect you will need a large study that is methodologically sound. Remember bias is a systematic error in a study that makes the findings of the study depart from the truth. Small effects seen in studies lower down the upright pyramid are potentially biased (ie not true). If you anticipate very large effects then observational studies or small RCTs might be just fine.
An alternative way to use this diagram with learners is to temper the findings of a study. If a small effect is seen in a small, lower quality study they should be taught to question that finding as likely departing from the truth. Don’t change clinical practice based on it, but await another study. A very large effect, even in a lower quality study, is likely true but maybe not as dramatic as it seems (ie reduce the effect by 20-30%).
I applaud Dr. La Rochelle for developing a figure which explains these relationships so well.
I have always suspected that one reason that physicians don’t critically appraise articles is that the criteria for critical appraisal are not readily available in a convenient, easy to use package. No more. I, with the help of some undergraduate computer science students, have created a critical appraisal app for Android devices. Its in the Google playstore and will be listed in the Amazon app store. Hopefully will develop an iOS version if this version is successful.
I tried to take critical appraisal to the next step by “scoring” each study and giving an estimate of the bias in the study. I then make a recommendation of whether or not the user should trust the study or reject it and look for another study. I think one of the shortcomings of the Users’ Guides series is that no direction is given to the user about what to do with the article after you critically appraise it. EBM Rater will give a suggestion about the trustworthiness of the study.
EBM Rater contains criteria to critically appraise all the major study designs including noninferiority studies. It even contains criteria to evaluate surrogate endpoints, composite endpoints, and subgroup effects.
Finally, it contains standard EBM calculators like NNT, NNH, and posttest probability. I added 2 unique calculators that I have not seen in any other app: patients specific NNT and NNH. Many of our patients are sicker or healthier that the patients included in a study. NNTs and NNHs are typically calculated with data from a study so the NNT and NNH is for the study patients. With my calculator you can figure out your individual patient’s NNT or NNH.
I hope you will give it a try and give me some feedback.
During journal clubs on randomized controlled trials there is often confusion about allocation concealment. It is often confused with blinding. In a sense it is blinding but not in the traditional sense of blinding. One way to think of allocation concealment is blinding of the randomization schedule or scheme. Allocation concealment hides the randomization or allocation sequence (what’s coming next) from patients and those who would enroll patients in a study. Blinding occurs after randomization and keeps patients, providers, researchers, etc from knowing which arm of the study the patient is in (i.e. what treatment they are getting).
Why is allocation concealment important in a randomized controlled trial? Inadequate or unclear allocation concealment can lead to an overestimation (by up to 40%!) of treatment effect (JAMA 1995;273:408). First, consider why we randomize in the first place. We randomize to try to equally distribute confounding and prognostic factors between arms of a study so we can try to isolate the effect of the intervention. Consider a physician who wants to enroll a patient in a study and wants to make sure her patient receives the therapy she deems likely most effective. What if she figured out the randomization scheme and knows what therapy the next patient will be assigned to? Hopefully you can see that this physician could undermine the benefits of randomization if she preferentially funnels sicker (or healthier) patients into one arm of the study. There could be an imbalance in baseline characteristics. It could also lead to patients who are enrolled in the study being fundamentally different or not representative of the patient population.
From The Lancet
You will have to use your judgment to decide how likely it is that someone could figure out the randomization scheme. You can feel more comfortable that allocation concealment was adequate if the following were used in the RCT:
- sequentially numbered, opaque, sealed envelopes: these are not able to be seen through even if held up to a light. They are sealed so that you can’t peek into them and see what the assignment is. As each patient is enrolled you use the next numbered envelope.
- pharmacy controlled: enrolling physician calls the pharmacy and they enroll the patient and assign therapy.
- centralized randomization: probably the most commonly used. The enrolling physician calls a central research site and the central site assigns the patient to therapy.
Proper randomization is crucial to a therapy study and concealed allocation is crucial to randomization. I hope this post helps readers of RCTs better understand what concealed allocation is and learn how to detect whether it was done adequately or not. Keep in mind if allocation concealment is unclear or done poorly the effect you see in the study needs to be tempered and possible cut by 40%.
If a test to detect a disease whose prevalence is 1/1000 has a false positive rate of 5%, what is the chance that a person with a positive test result actually has the disease? Assume the test is 100% sensitive.
Everyone taking care of patients, especially in primary care, needs to be able to figure this out. This is a basic understanding of what to do with a positive screening test result. If you can’t figure this out how would you be able to discuss the results with a patient? Or better yet how would you be able to counsel a patient on the implications of a positive test result prior to ordering a screening test?
Unfortunately, a study released online on April 21st found that 77% of respondents answered the question incorrectly. These results are similar to the results of a study in 1978, which used the same scenario. This is unfortunate as interpreting diagnostic test results is a cornerstone of EBM teaching and almost all (if not all) medical schools and residency programs teach EBM principles. So what’s the problem?
Here are some of my thoughts and observations:
- These principles are probably not actually being taught because the teachers themselves don’t understand them or if they do they don’t teach them in the proper context. This needs to be taught in the clinic when residents and medical students discuss ordering screening tests or on the wards when considering a stress test or cardiac catheterization, etc.
- The most common answer in the study was 95% (wrong answer). This shows that doctors don’t understand the influence of pretest probability (or prevalence) on post test probability (or predictive value). They assume a positive test equals disease. They assume a negative test equals no disease. Remember where you end up (posttest probability) depends on where you start from (pretest probability).
- I commonly see a simple lack of thinking when ordering tests. How many of you stop to think: What is the pretest probability? Based on that do I want to rule in or rule out disease? Based on that do I need a sensitive or specific test? What are the test properties of the test I plan to order? (or do I just order the same test all the time for the same diagnosis?)
- I also see tests ordered for presumably defensive purposes. Does everyone need a CT in the ER? Does everyone need a d-dimer for every little twinge of chest pain? When you ask why a test was ordered I usually hear something like this: “Well I needed to make sure something bad wasn’t going on”. I think this mindset transfers to the housestaff and students who perpetuate it. I commonly see the results of the ER CT in the HPI for God’s sake!!!
- Laziness. There’s an app for that. Even if you can’t remember the formula or how to set up a 2×2 table your smartphone and Google are your friends. Information management is an important skill.
So what’s the answer to the question above? 1.96% (Remember PPV = true pos / true pos + false pos so 1 / 1 + 50 = 1.96) If its easier set up a 2 x 2 table.
This very sensitive (100%) and fairly specific (95%) test (positive LR is 20!) wasn’t very informative when positive. Probability only went from 0.1% to 2%. The patient is still not likely to have disease even with a positive test. It would have been more useful if the test result was negative. Thus, in a low probability setting your goal is to rule out disease and you should choose the most sensitive test (Remember SnNout).
The current issue of the New England Journal of Medicine contains an important trial- the PEITHO trial. Its important because it tells us what not to do.
In the PEITHO trial patients with intermediate risk pulmonary embolism (right ventricular dysfunction and myocardial injury with no hemodynamic compromise) were randomized to a single weight-based bolus of tenecteplase or placebo. All patients were given unfractionated heparin. Patients were followed for 30 days for the primary outcome of death from any cause or hemodynamic decompensation within 7 days after randomization.
This table shows the efficacy outcomes. Looks promising doesn’t it.
The primary outcome was significantly reduced by 56%. This composite outcome is not a good one though. Patients would not consider death and hemodynamic decompensation equal. Also the pathophysiology of the 2 outcomes can be quite different. The intervention should also have a similar effect on all components of a good composite and there is a greater effect on hemodynamic decompensation than death. Thus, don’t pay attention to the composite but look at the composite’s individual components. Only hemodynamic decompensation was significantly reduced (ARR 3.4%, NNT 30). Don’t get me wrong this is a good thing to reduce.
But with all good can come some bad. This trial teaches that we must pay attention to adverse effects. The table below shows the safety outcomes of the PEITHO trial. Is the benefit worth the risk?
You can see from the table that major extracranial bleeding was increased 5 fold (ARI 5.1%, NNH 20) as was stroke, with most of them being hemorrhagic (ARI 1.8%, NNH 55).
This trial teaches a few important EBM points (I will ignore the clinical points it makes):
- You must always weigh the risks and benefits of every intervention.
- Ignore relative measures of outcomes (in this case the odds ratios) and calculate the absolute effects followed by NNT and NNH. These are much easier to compare.
- Watch out for bad composite endpoints. Always look at individual components of a composite endpoint to see what was affected.
2 papers got published this week to further validate the pooled risk equations developed for the ACC/AHA Cholesterol Guidelines.
Muntner and colleagues used the REGARDS participants to assess the calibration and discrimination of the pooled risk equations. This study had potential as it oversampled patients from the stroke belt. This is important because the Pooled Risk Equations were developed to overcome the limitations of the Framingham tool (mainly its lack of minorities). I have a real problem with this study because the pooled risk equations estimate 10 yr risk of CHD and stroke and this study only has 5 yrs of follow-up for the REGARDS participants. I don’t think their estimates of calibration and discrimination are valid. Risk of CHD and stroke should increase over time so event rates could change with 5 more years of follow-up. The important thing this paper adds is the reminder that observational studies often lack active surveillance. Most observational studies rely on self report of outcomes and obviously silent events would be missed by the patient as would events for which the patient didn’t seek evaluation. Muntner and colleagues also used Medicare claims data to identify events not detected through routine cohort follow-up and found 24% more events. This is a useful lesson from this study.
In a more useful study Kavousi and colleagues compared 3 risk prediction tools (pooled risk equations, Framingham, and SCORE) using the Rotterdam Study, a prospective population-based cohort of persons aged 55 yrs and older. This cohort does have 10 yrs of follow-up.
This figure shows that at each level of risk the pooled risk equations overestimated risk, though less so in women.
This figure shows the proportion of patients for whom treatment is recommended (red bars), treatment should be considered (yellow bars), and no treatment is recommended (green bars). As you can see the new risk tool leads to the large majority of men “needing treatment” compared to previous guidelines (ATP III) and the current European guidelines (ESC).
Finally, this figure shows the calibration curves and the calibration was not good. The blue dots should lie right upon the red line for good calibration. Furthermore, the c-statistic is 0.67 (a measure of discrimination which means the tool can differentiate diseased from nondiseased patients. A c-statitic above 0.7 is considered moderate to good. The closer to 1 the better).
Why might the pooled risk equations overestimate risk? Maybe they don’t if you believe the Muntner study. It could just be a problem with the lack of active surveillance in the cohort studies used to validate the tool. Or they really do overestimate risk because they aren’t accurate or maybe more contemporary patients receive better therapies that improve overall health or maybe the baseline risk characteristics of the validation cohorts just differ too much from the development cohorts.
I am still not sold on the new pooled risk equations but they might not be much better than what we have been using based on the Kavousi study (Framinham also overpredicted risk and had poor calibration). I think we need more study and tweaking of the tool or we use the tool as is and focus more on cardiovascular risk reduction (with exercise, diet, tobacco cessation, diabetes and HTN control) and don’t focus so much on starting a statin right away.
The Mayo Clinic has a nice patient decision aid that you can use to help patients decide if a statin is right for them: http://statindecisionaid.mayoclinic.org/index.php/site/index