3 Pronged Approach to Reading a Clinical Study

There has been an interesting discussion on how to critically appraise a study on the Evidence-Based Health listserv over the last day. It is interesting to see different opinions on the role of critical appraisal.

One important thing to remember as pointed out by Ben Djulbegovic is that critical appraisal relies on the quality of reporting as these 2 studies showed: http://www.ncbi.nlm.nih.gov/pubmed/22424985  and http://www.ncbi.nlm.nih.gov/pmc/articles/PMC313900/ . The implications are important but difficult for the busy clinician to deal with.

There are 3 questions you should ask yourself as you read a clinical study:

  1. Are the findings TRUE?
  2. Are the findings FREE OF THE INFLUENCE OF BIAS?
  3. Are the findings IMPORTANT?

The most difficult question for a clinician to answer initially is if the findings are TRUE. This question gets at issues of fraud in a study. Thankfully major fraud(ie totally fabricated data) is a rare occurrence.  Totally fraudulent data usually gets exposed over time. Worry about fraudulent data when the findings seem too good to be true (ie not consistent with clinical experience). Usually other researchers in the area will try to replicate the findings and can’t. There are other elements of truth that are more subtle and occur more frequently. For example, did the authors go on a data dredging expedition to find something positive to report? This would most commonly occur with post hoc subgroup analyses. These should always be considered hypothesis generating and not definitive. Here’s a great example of a false subgroup finding:

The Second International Study of Infarct Survival (ISIS-2) investigators reported an apparent subgroup effect: patients presenting with myocardial infarction born under the zodiac signs of Gemini or Libra did not experience the same reduction in vascular mortality attributable to aspirin that patients with other zodiac signs had.

Classical critical appraisal, using tools like the Users’ Guides, is done to DETECT BIASES in the design and conduct of studies. If any are detected then you have to decide the degree of influence that the bias(es) has had on the study results. This is difficult, if not impossible, to determine with certainty but there are studies that estimate the influence of various biases (for example, lack of concealed allocation in a RCT) on study outcomes. Remember, most biases lead to overestimate of effects. There are 2 options if you detect biases in a study: 1) reduce the “benefit” seen in the study by the amounts demonstrated in the following table and then decide if the findings are still important enough to apply in patient care, or 2) discard the study and look for one that is not biased.

This table is synthesized from findings reported in the Cochrane Handbook (http://handbook.cochrane.org/chapter_8/8_assessing_risk_of_bias_in_included_studies.htm)

BIAS                                                                       EXAGGERATION  OF EFFECT OF BENEFIT

Lack of randomization                                                             25% (-2 to 45%)

Lack of allocation concealment                                              18% (5 to 29%)

Lack of blinding                                                                          9% (NR)

Finally, if you believe the findings are true and are free of significant bias,  you have to decide if they are CLINICALLY IMPORTANT. This requires clinical judgment and understanding the patient’s baseline risk of the bad outcome the intervention is trying to impact. Some people like to calculate NNTs to make this decision. Don’t just look at the relative risk reduction and be impressed because you can be misled by this measure as I discuss in this video: https://youtu.be/7K30MGvOs5s

What to do when evidence has validity issues?

I often wonder how different clinicians (and EBM gurus) approach the dilemma of critically appraising  an article only to find that it has a flaw(s). For example, a common flaw is lack of concealed allocation in a randomized controlled trial. Empirical studies show that the effects of experimental interventions are exaggerated by about 21% [ratio of odds ratios (ROR): 0.79, 95% CI: 0.66–0.95] when allocation concealment is unclear or inadequate (JAMA 1995;273:40812). 


So what should I do if the randomized trial doesn’t adequately conceal the allocation scheme? I could discard the study completely and look for another study. What if there isn’t another study? Should I ignore the data of a perfectly good study otherwise? I could use the study and adjust the findings down by 21% (see above for why) and if the effect of the intervention  still crosses my clinically important threshold then I would implement the therapy. I could use the study as is and assume it wasn’t important because the reviewers and editors didn’t think it was. This is foolish as many of them probably didn’t even recognize the flaw nor would many of them understand the impact.

I don’t have the right answer but wonder what more learned people do. I personally adjust the findings down and determine if I still want to use the information. The problem with this approach is it assumes that in the particular study I am reviewing that the estimate of effect is in fact biased…something I can’t really know.

What do you do?

Publication Bias is Common in High Impact Journal Systematic Reviews

A very interesting study was published earlier this month in the Journal of Clinical Epidemiology assessing publication bias reporting in systematic reviews published in high impact factor journals.  Publication bias refers to the phenomenon that statistically significant positive results are more likely to be published than negative results. They also tend to be published more quickly and in more prominent journals. The issue of publication bias is an important one because the goal of a systematic review is to systematically search for and find all studies on a topic (both published and unpublished) so that an unbiased estimate of effect can be determined from including all studies (both positive and negative). If only positive studies, or a preponderance of positive studies, are published and only these are included in the review then a biased estimate of effect will result.

Onishi and Furukawa’s study is the first study to examine the frequency of significant publication bias in systematic reviews published in high impact factor general medical journals. They identified 116 systematic reviews published in the top 10 general medical journals in 2011 and 2012: NEJM, Lancet, JAMA, Annals of Internal Medicine, PLOS Medicine, BMJ, Archives of Internal Medicine, CMAJ, BMC Medicine, and Mayo Clinic Proceedings. They assessed each of the systematic reviews that did not report an assessment of publication bias for publication bias using Egger test of funnel plot asymmetry, contour-enhanced funnel plots, and tunnel effects. RESULTS: The included systematic reviews were of moderate quality as shown in the graph below. About a third of “systematic reviews” didn’t even perform a comprehensive literature search while 20% didn’t  assess study quality. Finally, 31% of systematic reviews didn’t assess for publication bias. How can you call your review a systematic review when you don’t perform a comprehensive literature search and you don’t determine if you missed studies?

Quality of included reviews

From J Clin Epi 2014;67:1320

Of the 36 reviews that did not report an assessment of publication bias, 7 (19.4%) had significant publication bias. Saying this another way, if a systematic review didn’t report an assessment of publication bias there was about a 20% chance publication bias was present. The authors then assessed what impact publication bias had on the estimated pooled results and found that the estimated pooled result was OVERESTIMATED by a median of 50.9% because of publication bias. This makes sense as mostly positive studies are published and negative studies aren’t. Thus, you would expect the estimates to be overly optimistic.

The figure below reports the results for individual journals. JAMA had significant publication bias in 50% of the reviews that didn’t assess publication bias while the Annals had 25% and BMJ 10%. It is concerning that these high impact journals publish “systematic reviews” that are of moderate quality and have a significant number of reviews that don’t report any assessment of publication bias.

Results by journal

From J Clin Epi 2014;67:1320

Bottom Line: Always critically appraise systematic reviews published in high impact journals. Don’t trust that an editor, even of a prestigious journal, did their job….they likely didn’t.

Useful diagram to teach basic EBM concepts

Dr. La Rochelle published an article in BMJ EBM this month with a very useful figure in it (see below). It is useful because it can help our learners (and ourselves) remember the relationship between the type of evidence and its believability/trustworthiness.


Lets work through this figure. The upright triangle should be familiar to EBM aficionados as it is the typical hierarchy triangle of study designs, with lower quality evidence at the bottom and highest quality at the top (assuming, of course, that the studies were conducted properly). The “Risk of Bias” arrow next to this upright triangle reflects the quality statement I just made.  Case reports and case series, because they have no comparator group and aren’t systematically selected are at very high risk of bias. A large RCT or systematic review of RCTs is at the lowest risk of bias.

The inverted triangle on the left reflects possible study effects, with the width of the corresponding area of the triangle (as well as the “Frequency of Potential Clinically relevant observable effect arrow) representing the prevalence of that effect. Thus, very dramatic, treatment altering effects are rare (bottom of triangle, very narrow). Conversely, small effects are fairly common (top of triangle, widest part).

One way to use this diagram in teaching is to consider the study design you would choose (or look for) based on the anticipated magnitude of effect. Thus, if you are trying to detect a small effect you will need a large study that is methodologically sound. Remember bias is a systematic error in a study that makes the findings of the study depart from the truth. Small effects seen in studies lower down the upright pyramid are potentially biased (ie not true). If you anticipate very large effects then observational studies or small RCTs might be just fine.

An alternative way to use this diagram with learners is to temper the findings of a study. If a small effect is seen in a small, lower quality study they should be taught to question that finding as likely departing from the truth. Don’t change clinical practice based on it, but await another study. A very large effect, even in a lower quality study, is likely true but maybe not as dramatic as it seems (ie reduce the effect by 20-30%).

I applaud Dr. La Rochelle for developing a figure which explains these relationships so well.

Allocation Concealment Is Often Confused With Blinding

During journal clubs on randomized controlled trials there is often confusion about allocation concealment. It is often confused with blinding. In a sense it is blinding but not in the traditional sense of blinding. One way to think of allocation concealment is blinding of the randomization schedule or scheme. Allocation concealment hides the randomization or allocation sequence (what’s coming next) from patients and those who would enroll patients in a study. Blinding occurs after randomization and keeps patients, providers, researchers, etc from knowing which arm of the study the patient is in (i.e. what treatment they are getting).

Why is allocation concealment important in a randomized controlled trial? Inadequate or unclear allocation concealment can lead to an overestimation (by up to 40%!) of treatment effect (JAMA 1995;273:408). First, consider why we randomize in the first place. We randomize to try to equally distribute confounding and prognostic factors between arms of a study so we can try to isolate the effect of the intervention. Consider a physician who wants to enroll a patient in a study and wants to make sure her patient receives the therapy she deems likely most effective. What if she figured out the randomization scheme and knows what therapy the next patient will be assigned to? Hopefully you can see that this physician could undermine the benefits of randomization if she preferentially funnels sicker (or healthier) patients into one arm of the study. There could be an imbalance in baseline characteristics. It could also lead to patients who are enrolled in the study being fundamentally different or not representative of the patient population.

From The Lancet

From The Lancet

You will have to use your judgment to decide how likely it is that someone could figure out the randomization scheme. You can feel more comfortable that allocation concealment was adequate if the following were used in the RCT:
sequentially numbered, opaque, sealed envelopes: these are not able to be seen through even if held up to a light. They are sealed so that you can’t peek into them and see what the assignment is. As each patient is enrolled you use the next numbered envelope.
pharmacy controlled: enrolling physician calls the pharmacy and they enroll the patient and assign therapy.
centralized randomization: probably the most commonly used. The enrolling physician calls a central research site and the central site assigns the patient to therapy.

Proper randomization is crucial to a therapy study and concealed allocation is crucial to randomization. I hope this post helps readers of RCTs better understand what concealed allocation is and learn how to detect whether it was done adequately or not. Keep in mind if allocation concealment is unclear or done poorly the effect you see in the study needs to be tempered and possible cut by 40%.

N-of-1 Trial for Statin-Related Myalgia: Consider Conducting These Studies in Your Practice

The March 4th edition of the Annals of Internal Medicine contains an article by Joy and colleagues in which they conducted an N-of-1 trial in patients who had previously not tolerated statins. This is important because patients often complain that they cannot tolerate statins despite needing them.  I have wondered how much of this was a self-fulfilling prophecy because they hear a lot about this from friends and various media outlets. A N-of-1 trial is a great way to determine if the statin-related symptoms are real or imagined.

First, lets discuss N-of-1 trials. What is a N-of-1 trial? It’s a RCT of active treatment vs. placebo in an individual patient. The patient serves as his or her own control thus perfectly controlling for a variety of biases. When is a N-of-1 trail most useful? This design is not useful for self-limited illnesses, acute or rapidly evolving illnesses, surgical procedures or prevention of irreversible outcomes (like stroke or MI). It’s most useful for conditions that are chronic and for with therapy is prolonged. It’s best if the effect you are looking for occurs fairly quickly and goes away quickly when treatment is stopped. These trials are a good way to determine the optimal dose of a medication  for a patient. They are also good to determine if an adverse effect is truly due to a medication. Finally, they are good way to test a treatment’s effect when the clinician feels it will be useless but the patient insists on taking it. How is a N-of-1 trial conducted? Get informed consent from the patient and make sure they understand that a placebo will be part of the study. Next the patient randomly undergoes pairs of treatment periods in which one period of each pair applies to the active treatment and one to placebo. A pharmacist will need to be involved to compound the placebo and to develop the randomization scheme (so as to keep clinician and patient blinded). Pairs of treatment periods are replicated a minimum of 3 periods. There needs to be a washout period between moving from active to placebo and vice versa. The length of the treatment period needs to be long enough for the outcome of interest to occur. Use the rule of 3s here (if an event occurs on average once every x days, then observe 3x days to be 95% confident of observing at least 1 event). What outcome should be measured? Most commonly these types of trials will be conducted to determine the effect of an intervention on quality of life type measures (eg pain, fatigue, etc).  Ask the patient what is the most troubling symptom or problem they have experienced and measure that as your outcome. Have the patient keep a diary or ask them to rate their symptoms on some meaningful scale at certain follow-up intervals. Do this while on active and placebo treatments. You will have to determine how much of a difference is clinically meaningful.  How do I interpret N-of-1 trial data? This can be a little difficult for non-statistically oriented clinicians. You could do the eyeball test and just see if there are important trends in the data. More rigorously you could calculate the differences in means scores of the placebo and active treatment periods. These would then be compared using a t test (freely available on the internet).

Back to Joy and colleagues N-of-1 trial on statins. They enrolled patients with prior statin-related myalgias. Participants were randomly assigned to get the same statin and dose that they previously didn’t tolerate or placebo. They remained on “treatment” for 3 week periods with 3 week washout periods in between. Patients weekly rated their symptoms on visual analogue scales for myalgias and specific symptoms (0-100, with 0 being no symptoms and 100 being the worst symptoms). It was felt a difference of 13 was clinically significant. What did they find? There were no statistically or clinically significant differences between statins and placebo in the myalgia score (4.37) nor on the symptom specific score (3.89). The neat thing the authors did was to determine if patients resumed taking statins after reviewing the results of their N-of-1 trial and 5 of the 8 patients resumed statins (one didn’t because a statin was no longer indicated).

So are statin related myalgias mostly in our patients’ heads? Maybe. This study is by no means definitive because it only enrolled 8 patients but it at least suggests a methodology you can use to truly test if a patient’s symptoms are statin related or not. This is important to consider because the most recent lipid treatment guidelines focus on using statins only and not substituting other agents like ezetimibe or cholestyramine. So give this methodology a try. You and your patients will likely be amazed at what you find.

Conflicts of Interest in Online Point of Care Websites

Full disclosure: I was an Society of General Medicine UpToDate reviewer several years ago and received a free subscription to UpToDate for my services.  I use UpToDate regularly also (through my institution library).

Amber and colleagues published an unfortunate finding this week in the Journal of Medical Ethics. They found that UpToDate seems to have some issues with conflicts of interest by some of its authors and editors.

UpToDate makes this claim on its Editorial subpage: “UpToDate accepts no advertising or sponsorships, a policy that ensures the integrity of our content and keeps it free of commercial influences.” Amber and colleagues findings would likely dispute that the content is “free of commercial influences”.

Amber and colleagues reviewed the Dynamed and UpToDate websites for COI policies and disclosures. They only searched a limited number of conditions on each site (male sexual dysfunction, fibromyalgia, hypogonadism, psoriasis, rheumatoid arthritis, and Crohn’s disease) but their reasoning seems solid: treatments of these entities can be controversial (for the 1st 3) and primarily involve biologics (last 3).  It seems reasonable that expert opinion could dominate recommendations on male sexual dysfunction, fibromyalgia, hypogonadism and that those experts could be conflicted. (Editorial side note: Few doctors recommend stopping offending medications or offer vacuum erection devices instead of the “little blue pill”. Most patients don’t even realize there are other treatments for ED other than “the little blue pill” and its outdoor bathtub loving competitor!- but I digress). The biologics also make sense to me because this is an active area of research and experts writing these types of chapters could get monies from companies either for research or speaking.

What did they find? Both Dynamed and UpToDate mandate disclosure of COIs (a point I will discuss later). No Dynamed authors or editors reported any COIs while quite a few were found for UpToDate. Of the 31 different treatments mentioned for these 6 topic areas evaluated for 14 (45%) of them the authors, editors, or both received grant monies from the company making the therapies mentioned. Similarly 45%  of authors, editors, or both were consultants for companies making these therapies.  For 5 of the 31 therapies authors or editors were on the speakers bureaus for the companies making these therapies. What’s most worrisome is that both the authors and editors were conflicted for the psoriasis chapter. Thus there were no checks and balances for this topic at all!

From blackbeltbartending.com

While finding COIs is worrisome it doesn’t mean that the overall quality of these chapters was compromised nor that they were biased. We don’t know at this time what effect the COIs had. Further study is needed. Unfortunately, this is probably just  the tip of the iceberg. Many more chapters probably suffer from the same issues. Furthermore, traditional textbooks likely have the same problems.

Disclosing COIs is mostly useless. Disclosure doesn’t lessen their impact.  I don’t understand why nonconflicted authors can’t be found for these chapters.  Do we care so much about who writes a chapter that we potentially compromise ethics for name recognition? Those with COIs should not be allowed to write a chapter on topics for which they have a conflict. Period. If UpToDate is so intent on having them maybe they could serve as a consultant  to the authors or a peer reviewer but even that is stretching it. What really bothers me is that the editors for some of these chapters were also conflicted thus leaving no check on potential biases. As Amber and colleagues point out, even though these chapters underwent peer review what do we know about the peer reviewers? Did they have any COIs. Who knows.

So what should all you UpToDate users do? I suggest the following:

  1. Contact UpToDate demanding a change.  They have the lion’s share of the market and until they lose subscribers nothing will likely change.
  2. Check for COIs yourself before using the recommendations in any UpToDate chapter. You should be especially wary if the recommendation begin with “In our opinion….”.  (An all too often finding)
  3. Use Dynamed instead. It has a different layout format than UpToDate but is quicker to be updated and it seems less conflicted. And its cheaper!

Small Studies Can Lead To Big Results

An interesting article was published in the British Medical Journal in April. Researchers looked at the influence of sample size on treatment effect estimates. The bottom line is that they found that treatment effect estimates were significantly larger from smaller trials as compared to larger trials (up to 48% greater!). The figure below shows this relationship. The left graph compares sample sizes from each study broken into quartiles and on the right arbitrary divisions by raw numbers.

Comparison of treatment effect estimates between trail sample sizes

So what does this mean for the average reader of medical journals? Pay attention to sample size. Early studies on new technology (whether meds or procedures) are often carried out on a fairly small group of people. Realize that what you see is likely overestimated (compared to a large study). If benefits are marginal (barely clinically significant) realize they likely will go away with a larger trial. If benefits are too good to be true….they likely are too good to be true and you should temper your enthusiasm. I always like to see more than one trial on a topic before I jump in and prescribe new meds or recommend new procedures.

Why Can’t Guideline Developers Just Do Their Job Right????

I am reviewing a manuscript about the trustworthiness of guidelines for a prominent medical journal. I have written editorials on this topic in the past (http://jama.jamanetwork.com/article.aspx?articleid=183430 and http://archinte.jamanetwork.com/article.aspx?articleid=1384244). The authors of the paper I am reviewing reviewed the recommendations made by 3 separate medical societies on the use of a certain medication for patients with atrial fibrillation. The data on this drug can be summarized as follows: little benefit, much more harm. But as you would expect these specialists recommended its use in the same sentence as other safer and more proven therapies. They basically ignored the side effects and only focused on the minimal benefits.

Why do many guideline developers keep doing this? They just can’t seem to develop guidelines properly. Unfortunately their biased products have weight with insurers, the public, and the legal system. The reasons are complex but solvable. A main reason (in my opinion) is that they are stuck in their ways. Each society has its guideline machine and they churn them out the same way year after year. Why would they change? Who is holding them accountable? Certainly not journal editors. (As a side note: the journals that publish these guidelines are often owned by the same subspecialty societies that developed the guidelines. Hmmmm. No conflicts there.)

conflict of interest

The biggest problem though is conflicts of interest. There is intellectual COI. Monetary COI. Converting data to recommendations requires judgment and judgment involves values. Single specialty medical society guideline development panels involve the same types of doctors that have shared values. But I always wonder how much did the authors of these guidelines get from the drug companies? Are they so married to this drug that they don’t believe the data? Is it ignorance? Are they so intellectually dishonest that they only see benefits and can’t understand harm? I don’t think we will ever truly understand this process without having a proverbial fly on the wall present during guideline deliberations.

Until someone demands a better job of guideline development I still consider them opinion pieces or at best consensus statements. We need to quit placing so much weight on them in quality assessment especially when some guidelines, like these, recommend harmful treatment.

Danish Osteoporosis Prevention Trial Doesn’t Prove Anything

The overstatement of  the DOPS trial (http://www.bmj.com/content/345/bmj.e6409) results have bothered me this week. So much so that even though I am on vacation I wanted to write something about it. Thankfully comments linked to the article show that at least a few readers were smart enough to detect the limitations of this study. What has bothered me is the ridiculous headline on theheart.org about this trial (http://www.theheart.org/article/1458789.do)

HRT cuts CVD by 50%, latest “unique” data show

First off all data is unique so that’s stupid…..CVD cut by 50%. When something seems too good to be true and goes against what we already know take it with a grain of salt. Almost nothing in medicine is 50% effective, especially as  primary prevention. But I digress.

The authors of the trial point out their study was different that the previous large HRT study – the Womens’ Health Initiative (WHI). So why do these studies contradict each other?

Whenever you see a study finding always consider 4 things that can explain what you see and its your job to figure out which one it is: truth, chance, bias, and confounding. So let’s look at the DOPS study with this framework

Truth: maybe DOPS is right and the Cochrane review with 24,283 total patients is wrong. Possible but unlikely. DOPS enrolled 1006 patients and has very low event rates (much lower than other studies in this area).

Chance: The composite outcome (which I’ll comment on in a minute) did have a p value <0.05 but none of its components were statistically significant. Each study we do can be a false positive study (or a false negative). So its possible the study is a false positive and if repeated would not give the same results.  Small studies are more likely to have false positives and false negatives.

Bias: Biases are systematic errors made in a study.There are a couple in this study: no blinding (this leads to overestimation of effects) and poorly concealed allocation (again leads to overestimation of effects).

Confounding: Women in the control group were about 6 months older than treated patients but this was controlled for in the analysis phase. What else was different about these women that could have affected the outcome?

So far my summary of this study would be that it is small with potential for overestimation of effects due to lack of blinding and poorly concealed allocation.

But there’s more:

  • This study ended years ago and is now just getting published. Why? Were the authors playing with the data? The study was industry funded and the authors have industry ties. Hmmm.
  • The composite outcome they used is bizarre and not the typical composite used in cardiovascular trials . They used death, admission to the hospital for myocardial infarction or heart failure. This isn’t a good composite because patients wouldn’t consider each component equally important and the biology of each component is very different. Thus you must look at individual components and none are statistically significant by themselves.
  • The WHI is the largest HRT trial done to date. Women in the WHI were older and fatter than the DOPS participants and thus are at higher risk. So why would women at higher risk for an outcome gain less benefit that those at lower risk for the outcome? Things usually don’t work that way. A big difference though in these 2 trials is that DOPS women started HRT earlier than WHI women. So maybe timing is important.

Thus, I think this trial at best suggests a hypothesis to test: starting HRT within the first couple of years compared to starting later is more beneficial. DOPS doesn’t prove this. The body of evidence contradicting this trial is stronger than DOPS. Thus I don’t think I will change what I tell my female patients.