A response to criticisms of intention-to-treat analysis

I received an email about an in introductory video I made for medical students about intention-to-treat (ITT) analysis. The author of the email also made a comment about the video on YouTube. The points he makes are valid and he expands on them in this manuscript. I will defend, if you will, the concept of ITT analysis.

First, ITT analysis refers to analyzing all patients based on the arm of a study they are randomized to no matter what happens to them…”Once randomized, always analyzed”. Even if they don’t take the intervention or become non-compliant their outcomes count against the group to which they were randomized. It is considered the primary analysis for superiority studies (where we want to prove an intervention is better than something else).

Important background disclaimer: ITT analysis is just one part of the methodology of a randomized controlled trial. (The other parts being concealed allocation, randomization, blinding, equal cointerventions, etc.) There is a tendency of critics of ITT analysis to think of it in isolation and I think that is a mistake. It is just part of an overall plan for conducting a study.

Randomization is considered the gold standard methodology for allocation in therapy studies. It is done for a couple of main reasons: 1) to equalize both known and unknown prognostic or confounding factors and 2) to prevent selection bias or cherry picking of patients to one arm of the study or another based on prognosis. In general, ITT analysis helps preserve the effect of randomization. If you start pulling out people differentially from each arm of a study the remaining people (to be analyzed) are likely not balanced prognostically. Yes, you can adjust for disparities between groups but only those that are measured. You can not adjust for unmeasured confounders. Randomization balances even unmeasured counfounders (assuming sample size is large enough). So its important to preserve randomization.

Dr. Feinman points out in his comment that “In the experiment, people sensibly want to know what is the effect of surgery compared to aspirin. Instead, ITT answers instead what is the effect of TELLING PEOPLE to have surgery vs. taking aspirin. Is that really what you want to know?” This is true but this is what we do in practice. I recommend a strategy to a patient and they will either follow it or they wont. So maybe it’s not the ITT analysis that is the problem but the question of the study. Another piece to this is that a per-protocol analysis can (and probably should) be done after the ITT analysis to see the effect in those who actually got the therapy. The results of both analyses should be reported so the reader can better understand the results. Data on dropout/noncompliant patient demographics/comorbidities/etc needs to also be reported so we can better understand potential reasons for drop outs or noncomplaince.

I mention in the video that ITT analysis is more conservative in that it will  less likely lead you to falsely reject the null hypothesis and falsely conclude that the intervention is effective. This is because those in the intervention group that dont get the intervention are actually like the control group; leading to more similar event rates and more difficulty in finding a significant difference.  So using ITT analysis somewhat depends on the risk to the patient of falsely saying the intervention is effective (when it isnt) or falsely saying it isnt effective (when it is). Conversely, per protocol analysis is less conservative in superiority studies.

I’m not sure what Ive written here will change anyones mind. I think a compromise is to use both ITT analysis and per protocol analysis and report both results. Hopefully, they are similar or at least qualitatively similar.

What do you say about this controversy?

Evidence-based Attire Part 1

I am going to write a 2 part series on what we should wear as physicians. I was inspired to do this when my resident said he has received an email declaring all male physicians must wear ties. I mostly have abandoned tie wearing as they are hot, uncomfortable and I just can’t find matching ties to many of the shirts I want to wear and don’t want to commit a fashion faux pas. I also perceived them to be an infection risk (I will address this in part 2 of this series). When I don’t wear a tie I wear my white coat. When I wear a tie I often forgo the white coat. What do patients think about this?

I felt it important to try to review a smattering of the literature on the topic. I admit I didn’t do a full literature review because the importance of this is just not worth the time. I did randomly select from the studies I did find so as not to only support my biased opinion that patients don’t really care what we wear.

There is a fair amount of literature on this topic in both the medical and dental world. Many of the studies included pictures like the following for patients to critique and give their opinion of the physicians depicted. They were often asked about the professionalism of the attire and their desirability of certain aspects of dress.

I didn’t find a dress option I really had hoped to find information on: white coat but no tie.

What do patient’s think about how we dress?

Below are some selected figures from studies showing these results. This first table shows that wearing a tie was fairly desirable but not as desirable as a name tag and white coat.

This is from a study of anesthesiologists (https://goo.gl/kTKUoM)

The figure below was from a study of anesthesiologist. It included 110 patients. You can see that wearing a tie was felt to be important by about 25-30% of them. Wearing a name tag and white coat was more important.

Another from a study of anesthesiologists (https://goo.gl/5fMh7k)

The figure below suggests that wearing a ties led to greater patient comfort in dealing with the doctor. Interestingly, not wearing a tie has the same ranking as wearing a white coat in patient’s comfort of dealing with a doctor dressed in such a way.

Finally, the table below shows patients in this study mostly disagreed that ties were important and didn’t even feel white coats were all that important. They felt is was most important for doctors to act professionally.

So, there you have it. White coats and name tags seem to be the most important. It helps patients identify us as doctors and we are perceived to be more professional in the white coat. Ties have mixed results. I am still undecided as to how I feel about this. I used to love wearing ties…now not so much. I think I will make my decision after I review the infection risk data. Stay tuned for that installment which will be coming soon.

What do you think?

Academic promotion and salary structure is part of the problem with research

There has been an interesting string on a listserv I am a member of about EBM being highjacked and the poor quality of research and of how so much research goes unpublished.  Below are two responses that got me thinking about something:

I am not impressed by the recent declaration in BMJ just because it is about the bad studies and the need of new EBM. What new? The authors of the declaration published calls for new EBM almost annually.  And most the time the calls are to go beyond evidence, and to do good, not bad.

Yes, too many research are poorly designed and executed. Is it new finding? Was ever EBM quiet about the bad evidence?

“today’s world …so financially driven” – was it ever not like that? Can one name the Golden Age?

If not kill your self, one need to live in the real world and try to make it better. EBM is specifically about it: critical appraisal and education (in wide sense). It is about use of the best available evidence for the good of the patients. Best available, sic. To influence research, to improve research, to clean the publication practice is a good thing, but it is beyond EBM in pure sense.


From the link, ” Too many research studies are poorly designed or executed. Too much of the resulting research evidence is withheld or disseminated piecemeal”. In short, medical research is not implemented correctly for a number of reasons. So my question is should we scrap medical research? OR RCT’s which are supposed to be the gold standard for trials should be dropped from research?

For me the most logical answer is better implementation. So the debate should how we can do this in today’s world which is so financially driven. In other words, the debate/discussions should not be about research or RCT’s are flawed and hence should be replaced.

Also, please don’t blame EBM for shoddy research practices. EBM has no control on what researchers like to do or don’t like to do. EBM can only make recommendations on how to do better research and interpret research.

My miniepiphany why reading this string was that academic medicine is partly to blame for some of the problems in the evidence base. In academics you have to garner grants to pay for yourself (or at least part of your salary) and you have to publish to get promoted. So we have a system that rewards quantity over quality. Lots of research is done because we have to do it to get promoted. But how much of it is worthwhile research? Not much. Many academics aren’t trained in proper research technique so they perform lower quality studies.

So one fix to “EBM” could be to restructure how we finance academics. We need to quit focusing on grants (especially considering governmental funding of these goes down every year) for compensation. Teaching (and yes medical schools need to actually pay adequately for the teaching we do) and clinical activities should be the primary funders of academics. Then research could be done on important things and by properly trained researchers.


What can we do to reduce the amount of research that goes unpublished?

An excellent piece was published by Paul Glasziou and Iain Chalmers about the large percentage of research that goes unpublished. As they note, the estimate that 50% goes unpublished is likely an underestimate. Unfortunately they didn’t offer any significant solutions other than we need a “better understanding of the causes of, and cures for, non-publication.”

A “simple” solution is for the drug/device approval process to require all studies related to that product and/or conducted/funded by the requesting company be registered and published. This would miss studies done after drug/device approval or done by independent parties but a large number of nonpublished studies are conducted or funded by the companies that market the drug/device. This would also miss all the other studies not directly related to drug/devices (e.g. epidemiological studies).

Another significant challenge is where to publish this information. The web makes the most sense as this is the cheapest route of publication. Maybe the FDA (or some international commission) could have a page(s) on each drug that includes full text access to all studies done on that drug/device. Would these need peer and editorial review? Yes, but a daunting task as we already struggle to find willing and competent peer reviewers. FDA budgets shrink repeatedly and this would be a significant financial burden.

What I really wanted to do in this post was to give my thoughts on a  question raised by Jon Brassey (Director of the TRIP Database):

  • What is better a large RCT or a SR based on a “biased subsample”?

Is a large RCT more desirable than a systematic review (SR) based on a biased subsample of studies? This has been a conundrum for some time. You can argue both sides of this. The reason he says biased subsample is that we know more positive studies get published than negative, larger effects get published more than small effects, etc. Is the answer to this question “it depends”? It depends on your goals: a more precise estimate of the biased effect (favors SR), more generalizability (favors SR), a potentially more methodologically sound result (favors RCT). What is interesting to consider is that the same study repeated over and over will result in a distribution of results (this is why it shouldn’t surprise us that when we do seemingly the same study we don’t get the exact same result). Should we repeat studies? When should we stop repeating the studies (i.e. when have we adequately defined the distribution of results)?

I don’t think we can really answer this question as both of these study types have limitations but if I had to pick one I would rather have a large RCT that is well done than a SR based on a limited subset of the data especially considering we don’t know what is missing and the effect seen in those missing studies.


Guidelines should not include opinions

The authors of this viewpoint have it wrong on a couple of fronts.

“The purpose of practice guidelines must be to develop the best possible recommendations from a body of evidence that may be contradictory or inadequate.”

While I agree that having recommendations come from an expert body is useful when there is inadequate or contradictory evidence I don’t think they should be labeled guidelines. A consensus statement is a more appropriate term. After all, if evidence is lacking or contradictory aren’t these experts just giving their opinion? Isn’t it possible that another group of experts would give a different opinion?

So don’t label it a guideline. That term has garnered reverence that was never intended. Guidelines become law almost. They are bastardized into punishing performance measures and become the cornerstone of legal argument. So, the term guideline should not be used lightly.

“…but those recommendations should always represent the best evidence and the best expert opinion currently available.”

NO! No expert opinion. Data is too open to interpretation. Humans filter information using prior knowledge, experience, and many heuristics (including, very importantly, the affect heuristic). A person’s specialty really influences how they interpret data. It’s one of the reasons it’s so important to have multidisciplinary panels so that conflicts and heuristics can be balanced. Unfortunately, most guideline panels are very homogeneous and conflicted.

I agree that we need unambiguous language in guidelines. They should only contain recommendations on things that have strong evidence that no one refutes. When they venture into the world of vagaries they become nothing more than opinion pieces.

What say you?

If no one can give an understandable definition of p-value why do we use it? Or quit saying statistically significant.

In this post I might come across as sounding dumb… but it won’t be the first or last time that will happen. I came across this article about how scientists cant even explain p-values in an understandable way. I know the students and residents I teach only understand it as indicating statistical significance.  It then occurred to me as I thought of the definition that maybe we should get rid of p-values altogether and let users decide if what they are seeing is clinically significant.

What is a p-value?

First, we have to understand one other concept- the null hypothesis. When doing a study researchers have to construct a null hypothesis that they will attempt to prove or disprove (depending on how you look at it) with their study. For a superiority trial, the null hypothesis is that there is no difference between the intervention and control treatments. For a equivalence trial, the null hypothesis is that there is a difference between the intervention and control treatments. Data from each arm of the study are compared with an appropriate statistical test yielding a test statistic (e.g. t statistic if using a t-test). That test statistic is compared to a table of critical values for that test statistic to determine if the value (of your calculated test statistic) is greater than the critical value required to reject the null hypothesis at the chosen alpha level.  If it is, you reject the null hypothesis and say it is statistically significant. If not, you fail to reject the null hypothesis and the finding is not statistically significant.

P-values are simply probabilities. They are the probability of finding what was found  in the study (or even a bigger finding) if the null hypothesis was true.

Here’s an example. Let’s say I conduct a study to examine the effect of reading for 30 min each day on medical certification examination scores. My null hypothesis is that reading will not improve exam scores.  I randomize one group of participants to read for 30 min every day and the control group to no reading. Both groups take the certifying examination and I calculate mean scores for each group. I can compare these means with a t-test (assuming what parametric tests require is met) and I find the mean examination score is 5 points higher in those who read for 30 minutes compared to those who didn’t with a p-value of 0.04. So, this  p-value of 0.04 means that there is a 4% probability that you would see at least a 5 point higher mean score in the reading group given that there is no effect of reading on exam scores. What if the p-value was 0.2? Then there is a 20% probability that you would see at least a 5 point higher mean score in the reading group given there is no effect of reading.

A common mistake pointed out by statisticians is for someone to interpret the p-value as the probability of what was seen in the study being due to chance alone. I think many people think of it this way because it’s easy to comprehend. But chance isn’t the only explanation for a false positive finding in a study.

All this is very confusing, right? Exactly. So, if no one really understands p-values is it time to abandon them and the concept of statistical significance? After all, the 0.05 cutoff is just a tradition. Is 0.07 all that different? Or 0.051?

I know if I am suggesting we get rid of the p-value I have to suggest an alternative. The only one I can think of are confidence intervals but the statistical definition of that is confusing and not clinically useful. So, should we abandon confidence intervals too? Both the p-value and the confidence interval give useful information but if that information is interpreted incorrectly can it really be useful information?

What should be the role of the p-value and the confidence interval? Maybe we just need to better educate users of the scientific literature on what these values can and cannot tell us then all of this would be moot.

What do you think?

Advanced EBM Elective for Medical Students

I teach an advanced EBM co-enrolled elective in the Spring semester to 3rd and 4th-year medical students.Below is a screen shot of the course home page.


This year I decided to revamp it completely and make it more philosophical. It was a challenge to decide what topics to include. You can see the topics I plan to cover along the right side of the image. I wanted them to be thought provoking and also useful at some level and not something taught in the usual EBM course.  I also made some other big changes after being inspired by 2 classes (Introduction to Openness and Social Network Learning) I recently took as part of my Master of Educational Technology degree (Thanks  Fred Baker and Jackie Gerstein). Here are the changes:

  • It’s open (all materials are free to use and anyone can take the course at any time). I hope to get some non-UAB students to take the course at the same time as my UAB students- a mini-MOOC so to speak.
  • Students will use social network tools extensively to enhance learning.
  • Students get a say in what they learn. I have designed the learning modules but I encourage students to develop their own learning module to replace one of the ones I developed.
  • Students will learn the value of social media for finding, creating, and sharing information.
  • Students will learn about personal learning networks and how to cultivate them.
  • Students will be exposed to learning activities they probably haven’t been exposed to in the past or used very much (e.g. jigsaw activity, curation,  concept mapping, blogging, tweeting) in medical education.

I hope these changes will enhance the ability of students to make meaning of this material. If nothing else I enjoyed creating a new class.

EBM is just not a priority in medical education

When I reflect on what I do each day as a physician it occurs to me that I use EBM skills very commonly. Here is a sampling:

  • I think about and assess pretest probability a lot
  • I think about choosing appropriate tests a lot
  • I apply information from studies a lot. I weigh risks and benefits of therapies. I think about patient context. I try to incorporate patient values and desires as much as possible.
  • I search for information following the Haynes’ 6S approach
  • I critically appraise primary studies and systematic reviews each week (not daily)
  • I make calculations because studies don’t always put information in the format I want
  • I have discussions with patients about the above issues

I am sure I am missing a lot of what I do that falls under “EBM”. I am revamping an introductory course in EBM I teach to 2nd medical students for the upcoming semester. It has been relegated to “just teach them enough to get a good score on Step 1”. Thankfully, I have a fuller online version that they will take during their scholarly time in the 3rd year so all is not lost. To make me feel better, I view the crash course I am teaching them this upcoming semester as scaffolding so that they can better understand my full online course. You can look at and use the materials I will use in the crash course in the tab above labeled “Online Teaching Resources” (I just realized still have to add a few items that the students will use).

We spend so much time in the 1st 2 years of medical school teaching about things that I honestly never ever use but yet what I use daily gets short shrift. Why is that? Are EBM skills not important? Is it assumed they are easy to develop later in one’s career on one’s own (they aren’t)? Is it just kicking the can down the road assuming in residency these skills will be learned? Or during the clerkships?

I for one wish none of this material was on Step 1. I think it’s too early. Furthermore, I am so sick of my course evaluations including statements like “Taught too much stuff that wasn’t on step 1”. I think you need some clinical knowledge to really learn EBM, but more importantly, to understand its importance. EBM type questions should get greater prominence on Step 2 and even more prominence on step 3 exams. One or 2 questions only reinforces the perceived lack of importance of EBM. EBM should have just as many questions as any of the specialties and each test should have more questions to reinforce that these skills are important and will be used. Maybe Santa will grant me that wish one of these years. (I am keeping my fingers crossed I get onto the NBME committee that writes the EBM questions. Maybe I can convince them of my plan)



I wonder how much EBM is really practiced out there

WARNING: a lot of cynicism in this post.

I have been revamping my EBM course that I teach at the medical school. As I’ve been doing this I realize we (the collective EBM teachers of the world) teach knowledge and skills that I don’t think are used very often once our doctors are out of residency.

Who really develops a PICO question in the clinical setting (outside of an academic center)?  Who is really doing database searches? (I think everyone just goes to Google, UpToDate or Dynamed and doesn’t care if studies are potentially missed.) How many critically appraise the primary literature? (Don’t most probably just read the conclusions from the abstract? or assume the study is good?) How many really understand how to “manipulate” findings of a study to adapt them to the patient they are seeing?

I know this seems like a negative post but practicing EBM is hard. It is a complex task that takes time and feedback to master. Once you leave training there is little feedback you will ever get on EBM skills. So they wane and all that can be done is to keep practicing like they have been by relying on experience, collective knowledge of consultants, and using Dr. Google. But how bad of a service have they provided their patients by doing this? Probably not all that bad.

As an educator I feel these skills are important and I think I have designed my course to provide the best chance for students to remember the material. But I don’t know how to convince practicing docs that they need to keep brushing up on EBM skills. I also don’t know what I would tell them if they asked “Well how do you want me to brush up on my EBM skills?” EBM skills should probably be a reasonably important part of the MOC process. Aren’t these skills key to actually keeping up?

Now its your turn. Tell me where I’m wrong and what should practicing docs do?