Should Traditional Intention To Treat Analysis Be Abandoned?

A commenter on my video about intention to treat analysis  asked about my thoughts on a twist on intention to treat analysis in which an adjustment is made (via an instrumental variable) for “treatment contamination”. A disclaimer: I am not a statistician or epidemiologist.

First lets start with some definitions:
1) intention to treat analysis: once randomized always analyzed in the group to which the patient was assigned (even if you don’t get the intervention in the intervention arm or you do get it in the control arm)
2) Superiority trial: study designed to “prove” one intervention is better than the other. Null hypothesis is that there is no difference between the groups.
3) Noninferiority trial: study designed to “prove” that one intervention is not worse than another treatment by some prespecified amount. Null hypothesis is the is a difference between the groups.
4) Instrumental variable: variable associated with the factor under study but not directly associated with the outcome variable or any potential confounders.

intention to treat analysis

The authors of this paper An IV for the RCT: using instrumental variables to adjust for treatment contamination in randomised controlled trials  state:

Intention to treat analysis estimates the effect of recommending a treatment to study participants, not the effect of the treatment on those study participants who actually received it. In this article, we describe a simple yet rarely used analytical technique, the “contamination adjusted intention to treat analysis,” which complements the intention to treat approach by producing a better estimate of the benefits and harms of receiving a treatment. This method uses the statistical technique of instrumental variable analysis to address contamination

So what do I think about this?
1) A main role of intention to treat (ITT) analysis is to be conservative in a superiority trial. That means we dont want to reject the null hypothesis falsely and claim treatment is better than the control. Another main role of ITT analysis is to preserve randomization (remember, once randomized always analyzed).

2) The authors of the BMJ paper point out that “Intention to treat analysis estimates the effect of recommending a treatment to study participants, not the effect of the treatment on those study participants who actually received it.” This is true but isnt that what real life is like? I recommend a treatment to my patients. Some take it, some don’t. Some who I tell not to use something wind up using it.

3) The authors of the BMJ paper further point out that ITT analysis “underestimates value of receiving the treatment.” That is possible also but its also the point (see #1 above).

4) The instrumental variable in this scheme would be a variable entered into the model indicating whether or not a patient received treatment or not (no matter what group they were assigned to). ITT analysis would still be used but be adjusted for treatment receipt. I worry that this could lead to overfitting the model- a situation where you can add too many variables to a model and start to detect noise beyond real relationships.

5) I think it would be difficult in a trial to judge adherence- what is the cutoff? Is it 100%? What about 60%? 40%? How much use by the control group is important? I think there are issues in judging what is contamination or not.

Time will tell if this technique should be used. We will have to study the treatment estimates from traditional ITT analysis and contamination adjusted ITT analysis. Until then I will stick with what is recommended…traditional ITT analysis.

Small Studies Can Lead To Big Results

An interesting article was published in the British Medical Journal in April. Researchers looked at the influence of sample size on treatment effect estimates. The bottom line is that they found that treatment effect estimates were significantly larger from smaller trials as compared to larger trials (up to 48% greater!). The figure below shows this relationship. The left graph compares sample sizes from each study broken into quartiles and on the right arbitrary divisions by raw numbers.

Comparison of treatment effect estimates between trail sample sizes

So what does this mean for the average reader of medical journals? Pay attention to sample size. Early studies on new technology (whether meds or procedures) are often carried out on a fairly small group of people. Realize that what you see is likely overestimated (compared to a large study). If benefits are marginal (barely clinically significant) realize they likely will go away with a larger trial. If benefits are too good to be true….they likely are too good to be true and you should temper your enthusiasm. I always like to see more than one trial on a topic before I jump in and prescribe new meds or recommend new procedures.

Danish Osteoporosis Prevention Trial Doesn’t Prove Anything

The overstatement of  the DOPS trial ( results have bothered me this week. So much so that even though I am on vacation I wanted to write something about it. Thankfully comments linked to the article show that at least a few readers were smart enough to detect the limitations of this study. What has bothered me is the ridiculous headline on about this trial (

HRT cuts CVD by 50%, latest “unique” data show

First off all data is unique so that’s stupid…..CVD cut by 50%. When something seems too good to be true and goes against what we already know take it with a grain of salt. Almost nothing in medicine is 50% effective, especially as  primary prevention. But I digress.

The authors of the trial point out their study was different that the previous large HRT study – the Womens’ Health Initiative (WHI). So why do these studies contradict each other?

Whenever you see a study finding always consider 4 things that can explain what you see and its your job to figure out which one it is: truth, chance, bias, and confounding. So let’s look at the DOPS study with this framework

Truth: maybe DOPS is right and the Cochrane review with 24,283 total patients is wrong. Possible but unlikely. DOPS enrolled 1006 patients and has very low event rates (much lower than other studies in this area).

Chance: The composite outcome (which I’ll comment on in a minute) did have a p value <0.05 but none of its components were statistically significant. Each study we do can be a false positive study (or a false negative). So its possible the study is a false positive and if repeated would not give the same results.  Small studies are more likely to have false positives and false negatives.

Bias: Biases are systematic errors made in a study.There are a couple in this study: no blinding (this leads to overestimation of effects) and poorly concealed allocation (again leads to overestimation of effects).

Confounding: Women in the control group were about 6 months older than treated patients but this was controlled for in the analysis phase. What else was different about these women that could have affected the outcome?

So far my summary of this study would be that it is small with potential for overestimation of effects due to lack of blinding and poorly concealed allocation.

But there’s more:

  • This study ended years ago and is now just getting published. Why? Were the authors playing with the data? The study was industry funded and the authors have industry ties. Hmmm.
  • The composite outcome they used is bizarre and not the typical composite used in cardiovascular trials . They used death, admission to the hospital for myocardial infarction or heart failure. This isn’t a good composite because patients wouldn’t consider each component equally important and the biology of each component is very different. Thus you must look at individual components and none are statistically significant by themselves.
  • The WHI is the largest HRT trial done to date. Women in the WHI were older and fatter than the DOPS participants and thus are at higher risk. So why would women at higher risk for an outcome gain less benefit that those at lower risk for the outcome? Things usually don’t work that way. A big difference though in these 2 trials is that DOPS women started HRT earlier than WHI women. So maybe timing is important.

Thus, I think this trial at best suggests a hypothesis to test: starting HRT within the first couple of years compared to starting later is more beneficial. DOPS doesn’t prove this. The body of evidence contradicting this trial is stronger than DOPS. Thus I don’t think I will change what I tell my female patients.