3 Pronged Approach to Reading a Clinical Study

There has been an interesting discussion on how to critically appraise a study on the Evidence-Based Health listserv over the last day. It is interesting to see different opinions on the role of critical appraisal.

One important thing to remember as pointed out by Ben Djulbegovic is that critical appraisal relies on the quality of reporting as these 2 studies showed: http://www.ncbi.nlm.nih.gov/pubmed/22424985  and http://www.ncbi.nlm.nih.gov/pmc/articles/PMC313900/ . The implications are important but difficult for the busy clinician to deal with.

There are 3 questions you should ask yourself as you read a clinical study:

  1. Are the findings TRUE?
  2. Are the findings FREE OF THE INFLUENCE OF BIAS?
  3. Are the findings IMPORTANT?

The most difficult question for a clinician to answer initially is if the findings are TRUE. This question gets at issues of fraud in a study. Thankfully major fraud(ie totally fabricated data) is a rare occurrence.  Totally fraudulent data usually gets exposed over time. Worry about fraudulent data when the findings seem too good to be true (ie not consistent with clinical experience). Usually other researchers in the area will try to replicate the findings and can’t. There are other elements of truth that are more subtle and occur more frequently. For example, did the authors go on a data dredging expedition to find something positive to report? This would most commonly occur with post hoc subgroup analyses. These should always be considered hypothesis generating and not definitive. Here’s a great example of a false subgroup finding:

The Second International Study of Infarct Survival (ISIS-2) investigators reported an apparent subgroup effect: patients presenting with myocardial infarction born under the zodiac signs of Gemini or Libra did not experience the same reduction in vascular mortality attributable to aspirin that patients with other zodiac signs had.

Classical critical appraisal, using tools like the Users’ Guides, is done to DETECT BIASES in the design and conduct of studies. If any are detected then you have to decide the degree of influence that the bias(es) has had on the study results. This is difficult, if not impossible, to determine with certainty but there are studies that estimate the influence of various biases (for example, lack of concealed allocation in a RCT) on study outcomes. Remember, most biases lead to overestimate of effects. There are 2 options if you detect biases in a study: 1) reduce the “benefit” seen in the study by the amounts demonstrated in the following table and then decide if the findings are still important enough to apply in patient care, or 2) discard the study and look for one that is not biased.

This table is synthesized from findings reported in the Cochrane Handbook (http://handbook.cochrane.org/chapter_8/8_assessing_risk_of_bias_in_included_studies.htm)

BIAS                                                                       EXAGGERATION  OF EFFECT OF BENEFIT

Lack of randomization                                                             25% (-2 to 45%)

Lack of allocation concealment                                              18% (5 to 29%)

Lack of blinding                                                                          9% (NR)

Finally, if you believe the findings are true and are free of significant bias,  you have to decide if they are CLINICALLY IMPORTANT. This requires clinical judgment and understanding the patient’s baseline risk of the bad outcome the intervention is trying to impact. Some people like to calculate NNTs to make this decision. Don’t just look at the relative risk reduction and be impressed because you can be misled by this measure as I discuss in this video: https://youtu.be/7K30MGvOs5s

Workshop on Developing Open Educational Resources

On Friday September 23, 2016 I am presenting a workshop on developing open educational resources (OERs) at the UAB Research and Innovations in Medical Education conference.

This Hyperdoc is a self-guided version of the workshop.

These are the Google slides I will use at the presentation.

I became very interested in openness during recent coursework for my Master in Educational Technology degree. I blog about my experiences in that course here.

If you want a good overview of openness download The Battle for Open ebook by Martin Weller.

Now it’s your turn: Tell me what you think of the materials or open resources/learning/publishing in general.

Tarnished Gold Chapter 4: Beating The Odds

Finally, a chapter I somewhat agree with.

This chapter discussed the difficulties in understanding probability. The examples they use aren’t good analogies for clinical probabilities but are interesting nonetheless.

Picture of a quote: its all relative

From QuoteAddicts.com

I’ll focus on what I agree with for this post. They discuss the misleading nature of reporting relative risks (and relative risk reductions also) in research reports. This is a real problem as clinicians often don’t understand that while the relative risk/benefit of an intervention is fairly constant across patient subgroups the absolute benefits aren’t. In general, if something is beneficial the sicker you are the more benefit you gain. For example, let’s say a treatment has a relative risk reduction for death in the next year of 75% (RR of 0.25) and we have 2 patients we are seeing. One has a risk (or probability) of death of 50% without the intervention and the other has a risk of death of 10%.  If patient one is given the treatment her risk is reduced from 50% to 12.5% (to see how I did this watch this video). If patient two is given the treatment his risk is reduced from 10% to 2.5%. So the absolute benefit is greater for patient one (37.5%) than for patient two (7.5%) even though the relative benefit is the same (75%). This is often a difficult concept for physicians to understand but once mastered is a useful way to discuss the benefits and harms of a proposed intervention with patients. Furthermore, it’s patient specific.  To get the probability of an outcome for an individual patient you could use a validated clinical prediction rule, the placebo rate from a trial, the results from studies of disease frequency (though these are rare) or, as a last ditch effort, guesstimation.

Tarnished Gold Chapter 3: Prove It

This chapter dealt with the issues of what constitutes evidence. Instead of focusing on their views I will focus on my views of evidence.


A common criticism of EBM is that it very strict in what it considers  acceptable evidence and it doesn’t consider clinical experience and pathophysiological rationale as important. Early EBM did focus too much on the RCT and Cochrane systematic reviews but this has changed. The current EBM paradigm focuses on multifactorial “evidence” including the patient’s clinical state and circumstances, clinical experience, and the best available evidence. Sometimes this will be a systematic review but often it will just be patient experience (what worked or didn’t work for them in the past) or pathophysiology. The early EBM paradigm cautioned us that we can be misled by our unsystematic observations and the pathophysiological rationale. For the latter, it’s because our understanding of pathophysiology changes and diseases are complex and multifactorial and interventions we study tend to be unifactorial. Nonetheless, clinical experience is evidence and is very important and no EBMer will say otherwise. Understanding pathophysiology is important and no EBMer will say otherwise. The key is to understand the limitations of any evidence source.

Evidence supports a belief and doesn’t have to be true. In clinical medicine we can never know the truth. We can only try to estimate the truth with a study because we can’t study every person with a given disease. We have to infer a lot. We generalize from a sample in a study to a whole population and back down to an individual patient. The authors of Tarnished Gold have a real problem with this paradigm but it’s what we do in clinical medicine. Bench research works differently. Rats can all be genetically and phenotypically the same. Bacteria can all be clones of each other. Bench scientists can study a whole population of something and declare an effect. We can’t do this in clinical medicine because we are all so heterogeneous and have free will.

EBM no longer worships only the RCT and the Cochrane review. Patient inputs are viewed as very important and slowly becoming equally important. Qualitative studies are gaining importance. Clinical experience will always be prominent in deciding what should be done from what could be done.

Tarnished Gold Chapter 2: Populations are not people

Populations are not people

First off the authors state that decisions sciences do not relate to EBM. They feel decisions are personal and statistical information is not important. They give the example of organ transplantation. Unfortunately, they skip an important step in their argument. Namely, that to know an organ transplant will be of benefit is based upon studies proving that they prolong life and these are based on statistical information.

They argue that EBM is based on a statistical blunder: the ecological fallacy.  There is some merit to this argument. The average finding applies to the average patient. What if your patient isn’t average. There are a couple of options. First, you could calculate your patient’s estimate of benefit (or risk) using the results from the study like I demonstrate in this video.  Almost every study report will include a confidence interval around the point estimate of benefit (or harm). The point estimate is the best guess about the findings of the study but there is uncertainty and the confidence interval helps quantify that uncertainty. You could use the upper and lower bounds of the confidence interval and decide if it includes a clinically important benefit. Finally, you could look for a subgroup analysis (yes I recognize the limitations of this) of a group of patients similar to yours. Despite all this, science is based on inference. We can never measure the effect of an intervention in all people. We often use inductive and deductive reasoning in science.

The authors spent several pages discussing pattern recognition in medicine and that EBM doesn’t help this. This is both true and false. It is true in that we are taught how certain things look and there will never be a study related to that. We have numerous studies though of how good elements of the history and PE are for diagnosing disease. Many of these are pattern recognition. We learn that peripheral edema, orthopnea, PND, and DOE are most likely congestive heart failure. That is pattern recognition but there is also a study that examines how good each of these components is to increasing or decreasing the probability of CHF. Thus, pattern recognition is informed by EBM.

There are more claims to be refuted in this chapter but these are the main ones worth refuting.


Tarnished Gold Chapter 1: Evidence-based Medicine

This is going to be a lot harder than I thought. I question why I am even wasting my time reading this tripe but I will plod forward so that there is a counterargument to this work. I also need to understand criticisms of this paradigm so that the paradigm can be improved.

Importantly, the authors focus on an outdated definition of EBM. This definition was the first iteration of the definition and is oft-quoted but it is out of date nonetheless.

Evidence based medicine (EBM) is the conscientious, explicit, judicious use of the current, best evidence in making decisions about the care of individual patients.

The current paradigm of EBM was published in 2002 well before this book was published and should have been included in this book. Hickey and Roberts claim it had its origins in the legal system which is total BS. If you read the early EBM papers there is no mention of the legal system driving this paradigm.  I also consider EBM to just be a set of skills (searching, critical appraisal, application) to use in the care of patients.



They focus and have problems with 2 words in the above definition: best and evidence. They are concerned that best leads to selection of evidence and that “one bit of evidence is better than another”. Of course some evidence is better than others. Empirical studies (not done by the evil drug companies) have demonstrated that certain design flaws, for example lack of blinding, lead to overestimation of effects. Studies have also demonstrated that observational study designs can overestimate effects and even give opposite effects to randomized trials (see the HRT saga). I’m sure they will argue later in the book that all these types of studies are rigged and randomized trials are rigged (probably because their holy grail, Vitamin C, failed in controlled trials to be useful). There are too many studies showing similar effects to discount the evidence that supports the fact that some studies are better than others.

They claim “EBM’s evidence does not mean scientific information or data, but refers to legal justification“. First off, EBM does not possess evidence so the ‘ is misplaced. Second, this statement doesn’t even possess any face validity. Journals are full of scientific information and data. What are they talking about?

They claim “EBM has little to offer the doctor treating a patient, beyond suggestions about what might be expected with an average patient“. Studies used to inform practice usually are based on a sampling of patients because we can’t study every single person with that problem. Sampling can be done to reflect a broad range of people with a given problem or it can be done to select for certain subpopulations of disease (for example, advanced disease or early disease). On average, most people are average. So their statement isn’t totally without merit. We can’t do studies on every type of patient. But, here is where the current paradigm helps us. We (as doctors) take into account the patient’s state and circumstances when applying the best available evidence to their case. We use our clinical training and experience to decide what we should do from what we could do. There are ways to adapt study data to an individual patient like I demonstrate in this video. N-of-1 trials can also be done on individual patients to see if a therapy is effective (more on this in another post).

Finally, (though there is a lot more I could comment on) they have problems with using statistics to analyze data. As I mention above, in medicine we can only sample a small percentage of those with disease. If we could study everyone we wouldn’t need statistics but since we can’t we use statistics on a sample to try to generalize it to the whole population. I don’t know of any other way to do this because we simply can’t study everyone. (I recognize this is a gross simplification of what statistics do and not totally accurate.)

The next chapter I’ll critique is entitled “Populations are not People”. Stay tuned…