EBM is just not a priority in medical education

When I reflect on what I do each day as a physician it occurs to me that I use EBM skills very commonly. Here is a sampling:

  • I think about and assess pretest probability a lot
  • I think about choosing appropriate tests a lot
  • I apply information from studies a lot. I weigh risks and benefits of therapies. I think about patient context. I try to incorporate patient values and desires as much as possible.
  • I search for information following the Haynes’ 6S approach
  • I critically appraise primary studies and systematic reviews each week (not daily)
  • I make calculations because studies don’t always put information in the format I want
  • I have discussions with patients about the above issues

I am sure I am missing a lot of what I do that falls under “EBM”. I am revamping an introductory course in EBM I teach to 2nd medical students for the upcoming semester. It has been relegated to “just teach them enough to get a good score on Step 1”. Thankfully, I have a fuller online version that they will take during their scholarly time in the 3rd year so all is not lost. To make me feel better, I view the crash course I am teaching them this upcoming semester as scaffolding so that they can better understand my full online course. You can look at and use the materials I will use in the crash course in the tab above labeled “Online Teaching Resources” (I just realized still have to add a few items that the students will use).

We spend so much time in the 1st 2 years of medical school teaching about things that I honestly never ever use but yet what I use daily gets short shrift. Why is that? Are EBM skills not important? Is it assumed they are easy to develop later in one’s career on one’s own (they aren’t)? Is it just kicking the can down the road assuming in residency these skills will be learned? Or during the clerkships?

I for one wish none of this material was on Step 1. I think it’s too early. Furthermore, I am so sick of my course evaluations including statements like “Taught too much stuff that wasn’t on step 1”. I think you need some clinical knowledge to really learn EBM, but more importantly, to understand its importance. EBM type questions should get greater prominence on Step 2 and even more prominence on step 3 exams. One or 2 questions only reinforces the perceived lack of importance of EBM. EBM should have just as many questions as any of the specialties and each test should have more questions to reinforce that these skills are important and will be used. Maybe Santa will grant me that wish one of these years. (I am keeping my fingers crossed I get onto the NBME committee that writes the EBM questions. Maybe I can convince them of my plan)



Tarnished Gold Chapter 4: Beating The Odds

Finally, a chapter I somewhat agree with.

This chapter discussed the difficulties in understanding probability. The examples they use aren’t good analogies for clinical probabilities but are interesting nonetheless.

Picture of a quote: its all relative

From QuoteAddicts.com

I’ll focus on what I agree with for this post. They discuss the misleading nature of reporting relative risks (and relative risk reductions also) in research reports. This is a real problem as clinicians often don’t understand that while the relative risk/benefit of an intervention is fairly constant across patient subgroups the absolute benefits aren’t. In general, if something is beneficial the sicker you are the more benefit you gain. For example, let’s say a treatment has a relative risk reduction for death in the next year of 75% (RR of 0.25) and we have 2 patients we are seeing. One has a risk (or probability) of death of 50% without the intervention and the other has a risk of death of 10%.  If patient one is given the treatment her risk is reduced from 50% to 12.5% (to see how I did this watch this video). If patient two is given the treatment his risk is reduced from 10% to 2.5%. So the absolute benefit is greater for patient one (37.5%) than for patient two (7.5%) even though the relative benefit is the same (75%). This is often a difficult concept for physicians to understand but once mastered is a useful way to discuss the benefits and harms of a proposed intervention with patients. Furthermore, it’s patient specific.  To get the probability of an outcome for an individual patient you could use a validated clinical prediction rule, the placebo rate from a trial, the results from studies of disease frequency (though these are rare) or, as a last ditch effort, guesstimation.

Tarnished Gold Chapter 3: Prove It

This chapter dealt with the issues of what constitutes evidence. Instead of focusing on their views I will focus on my views of evidence.


A common criticism of EBM is that it very strict in what it considers  acceptable evidence and it doesn’t consider clinical experience and pathophysiological rationale as important. Early EBM did focus too much on the RCT and Cochrane systematic reviews but this has changed. The current EBM paradigm focuses on multifactorial “evidence” including the patient’s clinical state and circumstances, clinical experience, and the best available evidence. Sometimes this will be a systematic review but often it will just be patient experience (what worked or didn’t work for them in the past) or pathophysiology. The early EBM paradigm cautioned us that we can be misled by our unsystematic observations and the pathophysiological rationale. For the latter, it’s because our understanding of pathophysiology changes and diseases are complex and multifactorial and interventions we study tend to be unifactorial. Nonetheless, clinical experience is evidence and is very important and no EBMer will say otherwise. Understanding pathophysiology is important and no EBMer will say otherwise. The key is to understand the limitations of any evidence source.

Evidence supports a belief and doesn’t have to be true. In clinical medicine we can never know the truth. We can only try to estimate the truth with a study because we can’t study every person with a given disease. We have to infer a lot. We generalize from a sample in a study to a whole population and back down to an individual patient. The authors of Tarnished Gold have a real problem with this paradigm but it’s what we do in clinical medicine. Bench research works differently. Rats can all be genetically and phenotypically the same. Bacteria can all be clones of each other. Bench scientists can study a whole population of something and declare an effect. We can’t do this in clinical medicine because we are all so heterogeneous and have free will.

EBM no longer worships only the RCT and the Cochrane review. Patient inputs are viewed as very important and slowly becoming equally important. Qualitative studies are gaining importance. Clinical experience will always be prominent in deciding what should be done from what could be done.

Tarnished Gold Chapter 2: Populations are not people

Populations are not people

First off the authors state that decisions sciences do not relate to EBM. They feel decisions are personal and statistical information is not important. They give the example of organ transplantation. Unfortunately, they skip an important step in their argument. Namely, that to know an organ transplant will be of benefit is based upon studies proving that they prolong life and these are based on statistical information.

They argue that EBM is based on a statistical blunder: the ecological fallacy.  There is some merit to this argument. The average finding applies to the average patient. What if your patient isn’t average. There are a couple of options. First, you could calculate your patient’s estimate of benefit (or risk) using the results from the study like I demonstrate in this video.  Almost every study report will include a confidence interval around the point estimate of benefit (or harm). The point estimate is the best guess about the findings of the study but there is uncertainty and the confidence interval helps quantify that uncertainty. You could use the upper and lower bounds of the confidence interval and decide if it includes a clinically important benefit. Finally, you could look for a subgroup analysis (yes I recognize the limitations of this) of a group of patients similar to yours. Despite all this, science is based on inference. We can never measure the effect of an intervention in all people. We often use inductive and deductive reasoning in science.

The authors spent several pages discussing pattern recognition in medicine and that EBM doesn’t help this. This is both true and false. It is true in that we are taught how certain things look and there will never be a study related to that. We have numerous studies though of how good elements of the history and PE are for diagnosing disease. Many of these are pattern recognition. We learn that peripheral edema, orthopnea, PND, and DOE are most likely congestive heart failure. That is pattern recognition but there is also a study that examines how good each of these components is to increasing or decreasing the probability of CHF. Thus, pattern recognition is informed by EBM.

There are more claims to be refuted in this chapter but these are the main ones worth refuting.


Tarnished Gold Chapter 1: Evidence-based Medicine

This is going to be a lot harder than I thought. I question why I am even wasting my time reading this tripe but I will plod forward so that there is a counterargument to this work. I also need to understand criticisms of this paradigm so that the paradigm can be improved.

Importantly, the authors focus on an outdated definition of EBM. This definition was the first iteration of the definition and is oft-quoted but it is out of date nonetheless.

Evidence based medicine (EBM) is the conscientious, explicit, judicious use of the current, best evidence in making decisions about the care of individual patients.

The current paradigm of EBM was published in 2002 well before this book was published and should have been included in this book. Hickey and Roberts claim it had its origins in the legal system which is total BS. If you read the early EBM papers there is no mention of the legal system driving this paradigm.  I also consider EBM to just be a set of skills (searching, critical appraisal, application) to use in the care of patients.



They focus and have problems with 2 words in the above definition: best and evidence. They are concerned that best leads to selection of evidence and that “one bit of evidence is better than another”. Of course some evidence is better than others. Empirical studies (not done by the evil drug companies) have demonstrated that certain design flaws, for example lack of blinding, lead to overestimation of effects. Studies have also demonstrated that observational study designs can overestimate effects and even give opposite effects to randomized trials (see the HRT saga). I’m sure they will argue later in the book that all these types of studies are rigged and randomized trials are rigged (probably because their holy grail, Vitamin C, failed in controlled trials to be useful). There are too many studies showing similar effects to discount the evidence that supports the fact that some studies are better than others.

They claim “EBM’s evidence does not mean scientific information or data, but refers to legal justification“. First off, EBM does not possess evidence so the ‘ is misplaced. Second, this statement doesn’t even possess any face validity. Journals are full of scientific information and data. What are they talking about?

They claim “EBM has little to offer the doctor treating a patient, beyond suggestions about what might be expected with an average patient“. Studies used to inform practice usually are based on a sampling of patients because we can’t study every single person with that problem. Sampling can be done to reflect a broad range of people with a given problem or it can be done to select for certain subpopulations of disease (for example, advanced disease or early disease). On average, most people are average. So their statement isn’t totally without merit. We can’t do studies on every type of patient. But, here is where the current paradigm helps us. We (as doctors) take into account the patient’s state and circumstances when applying the best available evidence to their case. We use our clinical training and experience to decide what we should do from what we could do. There are ways to adapt study data to an individual patient like I demonstrate in this video. N-of-1 trials can also be done on individual patients to see if a therapy is effective (more on this in another post).

Finally, (though there is a lot more I could comment on) they have problems with using statistics to analyze data. As I mention above, in medicine we can only sample a small percentage of those with disease. If we could study everyone we wouldn’t need statistics but since we can’t we use statistics on a sample to try to generalize it to the whole population. I don’t know of any other way to do this because we simply can’t study everyone. (I recognize this is a gross simplification of what statistics do and not totally accurate.)

The next chapter I’ll critique is entitled “Populations are not People”. Stay tuned…




A Cartoon About Blinding: Using New Tools Can Be Fun

I had to make a few slides about blinding and decided a cartoon might be fun to make and a graphic way to display the information. As I am getting a degree in educational technology I have a proclivity to try new tools. I found an article on free cartoon making tools and decided to give one a try. It was intuitive and had reasonable features. I had initially planned to draw my own characters and put masks on them but in the interest of time just used the characters already in the program.

Try using new tools when you can and the situation fits. It can be fun and interesting. Always remember that the tool you use should facilitate learning and not just be used because its cool. I felt a graphic would help learners understand blinding more that just a word description.

What do you think?

A comic strip with 3 panels showing single blind (patient only), double blind (patient and researchers) and triple blind (patients, researchers, and statisticians).

How to decide when a composite endpoint should go into the compost

Composite endpoints are commonly used in studies. A composite endpoint is an endpoint composed of several other endpoints. If a patient experiences any one of them they are considered to have experienced the endpoint of the trial. For example, a composite endpoint in a typical cardiovascular study includes nonfatal MI, nonfatal stroke and cardiovascular death. A patient doesn’t have to have all three just one of them.

Why use composite endpoints? The main reason is to reduce the number of patients needed in the study. The chance of a patient having any one outcome is much less than having any one of three outcomes. They are also used to potentially reduce the length of follow-up needed in a study. A patient is likely to develop one of three outcomes more quickly than any one outcome or one of the components of the composite can occur sooner than another (e.g. doubling of serum creatinine vs. initiation of hemodialysis).

Not all composites are created equal. Some are good and many are poorly developed. Examine the composite outcome below from the RENAAL trial published in the NEJM in 2001. The primary efficacy measure was the time to the first event of the composite end point of a doubling of the serum creatinine concentration, end-stage renal disease, or death. What do you think? Is this a good composite or a poor composite? (Note: I put a red mark next to the components of the composite)

From the RENAAL trial, NEJM 2001

From the RENAAL trial, NEJM 2001

I think this is a poorly designed composite. Why do I say that? A good composite should have the following characteristics:

  1. Each component should be valued equally by patients,
  2. Each component should occur with similar frequency, and
  3. The intervention should have the same relative effect on each component.

With this in mind, reevaluate the RENAAL composite endpoint. Hopefully you agree with me that its not a good composite endpoint. Let’s examine it more closely.

Issue #1: would patients consider each of the components to be of equal value? Patients would not consider death and doubling of serum creatinine as being equal. Clearly they would value death as a much worse outcome. So this composite fails here.

Issue #2: do each of the components of the composite occur with equal frequency? Looking at the percentages of the components in the losartan group they are pretty close to each other (21.6%, 19.6%, and 21%) so I would give the composite a pass on this criterion.

Issue #3: does the intervention (losartan) have an equal effect on each of the components of the composite? Look under the risk reduction column and the answer is no. Doubling of serum creatinine is reduced by 25% and end stage renal disease by 28% but death is actually increased by 2%. Thus, the composite fails on this criterion.

What should you do if the composite endpoint is a bad composite? Just ignore the composite and look at the individual components. Even if a composite is a good one you should always examine the individual components. So in this case losartan reduces the risk of ESRD and doubling of serum creatinine but has no effect on mortality.