If no one can give an understandable definition of p-value why do we use it? Or quit saying statistically significant.

In this post I might come across as sounding dumb… but it won’t be the first or last time that will happen. I came across this article about how scientists cant even explain p-values in an understandable way. I know the students and residents I teach only understand it as indicating statistical significance.  It then occurred to me as I thought of the definition that maybe we should get rid of p-values altogether and let users decide if what they are seeing is clinically significant.

What is a p-value?

First, we have to understand one other concept- the null hypothesis. When doing a study researchers have to construct a null hypothesis that they will attempt to prove or disprove (depending on how you look at it) with their study. For a superiority trial, the null hypothesis is that there is no difference between the intervention and control treatments. For a equivalence trial, the null hypothesis is that there is a difference between the intervention and control treatments. Data from each arm of the study are compared with an appropriate statistical test yielding a test statistic (e.g. t statistic if using a t-test). That test statistic is compared to a table of critical values for that test statistic to determine if the value (of your calculated test statistic) is greater than the critical value required to reject the null hypothesis at the chosen alpha level.  If it is, you reject the null hypothesis and say it is statistically significant. If not, you fail to reject the null hypothesis and the finding is not statistically significant.

P-values are simply probabilities. They are the probability of finding what was found  in the study (or even a bigger finding) if the null hypothesis was true.

Here’s an example. Let’s say I conduct a study to examine the effect of reading for 30 min each day on medical certification examination scores. My null hypothesis is that reading will not improve exam scores.  I randomize one group of participants to read for 30 min every day and the control group to no reading. Both groups take the certifying examination and I calculate mean scores for each group. I can compare these means with a t-test (assuming what parametric tests require is met) and I find the mean examination score is 5 points higher in those who read for 30 minutes compared to those who didn’t with a p-value of 0.04. So, this  p-value of 0.04 means that there is a 4% probability that you would see at least a 5 point higher mean score in the reading group given that there is no effect of reading on exam scores. What if the p-value was 0.2? Then there is a 20% probability that you would see at least a 5 point higher mean score in the reading group given there is no effect of reading.

A common mistake pointed out by statisticians is for someone to interpret the p-value as the probability of what was seen in the study being due to chance alone. I think many people think of it this way because it’s easy to comprehend. But chance isn’t the only explanation for a false positive finding in a study.

All this is very confusing, right? Exactly. So, if no one really understands p-values is it time to abandon them and the concept of statistical significance? After all, the 0.05 cutoff is just a tradition. Is 0.07 all that different? Or 0.051?

I know if I am suggesting we get rid of the p-value I have to suggest an alternative. The only one I can think of are confidence intervals but the statistical definition of that is confusing and not clinically useful. So, should we abandon confidence intervals too? Both the p-value and the confidence interval give useful information but if that information is interpreted incorrectly can it really be useful information?

What should be the role of the p-value and the confidence interval? Maybe we just need to better educate users of the scientific literature on what these values can and cannot tell us then all of this would be moot.

What do you think?

Tarnished Gold Chapter 1: Evidence-based Medicine

This is going to be a lot harder than I thought. I question why I am even wasting my time reading this tripe but I will plod forward so that there is a counterargument to this work. I also need to understand criticisms of this paradigm so that the paradigm can be improved.

Importantly, the authors focus on an outdated definition of EBM. This definition was the first iteration of the definition and is oft-quoted but it is out of date nonetheless.

Evidence based medicine (EBM) is the conscientious, explicit, judicious use of the current, best evidence in making decisions about the care of individual patients.

The current paradigm of EBM was published in 2002 well before this book was published and should have been included in this book. Hickey and Roberts claim it had its origins in the legal system which is total BS. If you read the early EBM papers there is no mention of the legal system driving this paradigm.  I also consider EBM to just be a set of skills (searching, critical appraisal, application) to use in the care of patients.

 

 

They focus and have problems with 2 words in the above definition: best and evidence. They are concerned that best leads to selection of evidence and that “one bit of evidence is better than another”. Of course some evidence is better than others. Empirical studies (not done by the evil drug companies) have demonstrated that certain design flaws, for example lack of blinding, lead to overestimation of effects. Studies have also demonstrated that observational study designs can overestimate effects and even give opposite effects to randomized trials (see the HRT saga). I’m sure they will argue later in the book that all these types of studies are rigged and randomized trials are rigged (probably because their holy grail, Vitamin C, failed in controlled trials to be useful). There are too many studies showing similar effects to discount the evidence that supports the fact that some studies are better than others.

They claim “EBM’s evidence does not mean scientific information or data, but refers to legal justification“. First off, EBM does not possess evidence so the ‘ is misplaced. Second, this statement doesn’t even possess any face validity. Journals are full of scientific information and data. What are they talking about?

They claim “EBM has little to offer the doctor treating a patient, beyond suggestions about what might be expected with an average patient“. Studies used to inform practice usually are based on a sampling of patients because we can’t study every single person with that problem. Sampling can be done to reflect a broad range of people with a given problem or it can be done to select for certain subpopulations of disease (for example, advanced disease or early disease). On average, most people are average. So their statement isn’t totally without merit. We can’t do studies on every type of patient. But, here is where the current paradigm helps us. We (as doctors) take into account the patient’s state and circumstances when applying the best available evidence to their case. We use our clinical training and experience to decide what we should do from what we could do. There are ways to adapt study data to an individual patient like I demonstrate in this video. N-of-1 trials can also be done on individual patients to see if a therapy is effective (more on this in another post).

Finally, (though there is a lot more I could comment on) they have problems with using statistics to analyze data. As I mention above, in medicine we can only sample a small percentage of those with disease. If we could study everyone we wouldn’t need statistics but since we can’t we use statistics on a sample to try to generalize it to the whole population. I don’t know of any other way to do this because we simply can’t study everyone. (I recognize this is a gross simplification of what statistics do and not totally accurate.)

The next chapter I’ll critique is entitled “Populations are not People”. Stay tuned…

 

 

 

What Does Statistically Significant Mean?

Hilda Bastian writes an important and well written blog on this topic in a recent Scientific American blog .

I don’t think I have much else to add other than read this blog. There are some great links inside her blog to further understand this topic.

I think we are too focused on p <0.05. What if the p value is 0.051? Does that mean we should ignore the finding? Is it really any different than p value of 0.0499?

statistically significant

Confidence intervals give information on both statistical significance and clinical significance but I worry about how they are interpreted also. (Disclaimer: the interpretation and use of the confidence interval that follows is not statistically correct but is how we use them clinically.) Lets say a treatment improves a bad outcome with a relative risk (RR) of 0.94 with 95% CI of 0.66-1.12. So the treatment isn’t “statistically significant” (the CI includes 1.0) but there is potential for a relatively significant clinical benefit [ the lower bound of the CI suggests a potential 34% reduction in the bad outcome (1- RR = relative risk reduction so 1-0.66 = 0.34 or 34%)]. There is also potential for a clinically significant increase in risk of 12%. So which is more important? Somewhat depends on whether you believe in this treatment or not. If you believe in it you focus on the potential 34% reduction in outcomes. If you don’t believe in the treatment you focus on the 12% increased risk. So that’s the problem with confidence intervals but they give much more information than p-values do.