Jazzing Up Journal Club

I am preparing to write something about what I think works well for journal club. We’ve tried lots of different things here at UAB. I would like to hear from you about what you do for your journal club. What works well in your journal club? What doesn’t work so well?

We need a dialogue here so please enter some comments. I’ll collate what I get and then give my own thoughts about Jazzing Up Journal Club.

Enhancing Physicians’ Use of Guidelines

Dr. Peter Pronovost recently penned a viewpoint piece for JAMA about how guidelines can be better implemented. He is well respected in the patient safety realm and clearly feels guidelines are a major way to improve patient safety. I agree that they are a piece of the puzzle. What I wanted to do in this post is critique his thoughts on enhancing guideline use by physicians. I think some of what he proposes is unrealistic at best and most likely impossible.

Let’s take one step back to look at barriers to guideline implementation that were identified in an important review by Cabana and colleagues in 1999. They performed a broad systematic review of 120 different surveys investigating 293 potential barriers to physician guideline adherence.  The figure below outlines what was found.

Barriers to Guideline Adherence

Barriers to Guideline Adherence

With this background let’s analyze Dr. Pronovost’s 5 strategies to increased guideline adherence.

  1. Guidelines should include an “unambiguous checklist with interventions linked in time and space“. He recommends key evidence-based practices. I concur with this recommendation. Checklists are something physicians can do and they have been shown to reduce harmful events. Also this would be behaviorally based and specific (i.e. something measurable). An example might be an item on a checklist to make sure that each day an assessment is made in the chart for the need for continued bladder catheterization. What I worry about is that too many recommendations are made in many guidelines. Recommendations need to be prioritized and limited to those things that really make a difference. The checklists could become so burdensome that they impair patient care as too much time will be spent checking off  the check list and not enough time actually caring for the patient.
  2. Guideline developers should “help clinicians identify and mitigate barriers to guideline use and share successful implementation strategies“.  Here is the impossible. While I agree in principle with this recommendation it can’t be implemented. Barriers are a local phenomenon; often a hyperlocal phenomenon. My hospital has 3 separate primary care practices (1 a resident practice and 2 fulltime provider practices) with very different types of docs and practice patterns. What will work in one of these clinics won’t work in the other. What works for one practitioner might not work with any other. Large centralized guideline developers just can’t be expected to develop solutions to barriers.
  3. Guideline developers could “collaborate to integrate guidelines for conditions that commonly coexist“. Most guidelines are single disease guidelines developed by single specialty groups. Patients are multimorbid. This is a great recommendation that will be tough to implement but that I agree with 100%. Diseases and their treatments interact with each other. Guideline developers often ignore these interactions. At best they will discuss some exceptions to the guidelines for other comorbidities but this isn’t enough. There are enough diabetics with hypertension and coronary artery disease to warrant a guideline on them. What about hypertension with renal disease? The combinations would have to be carefully thought out and the panels multidisciplinary with primary care physicians playing the prominent leadership role.
  4. Rely on systems rather than the actions of individual clinicians. Bravo. Many things are not totally under the physician’s control and there are often too many things to think about now a days for 1 person. Systems need to be engineered to deal with the mundane things we physicians don’t like to deal with (like elevating head of the bed in a ventilated patient. We would rather deal with managing the ventilator). Multidisciplinary teams at each care site would need to be put together to design the processes of care.
  5. Create transdisciplinary teams to develop scholarly guidelines with practice strategies. Not much detail is given in the manuscript about this but what I think he means could be twofold: 1) teams of clinicians, epidemiologists, implementation scientists and systems engineers would develop the guideline and 2) these same types of teams would study best practices for implementing them. Currently many guidelines don’t include implementation scientists nor systems engineers. It’s no wonder we have a hard time implementing guidelines with implementation isn’t really built into them from the start.

Much of this is already known but important to keep saying. Someday guideline developers and policy wonks will listen. Just shoving a guideline in our face isn’t the way to go. Currently reminders and “performance measures” are the main ways guidelines are being implemented. We will see if medical systems develop smart ways to use electronic health records to better implement guidelines.

What To Do When Guidelines Conflict

It is common for multiple guidelines to be made by different developers on the same topic. Problems arise though when different guidelines make differing recommendations. Which one should you believe?

The guideline development process is complex. The Institute of Medicine  published a framework for trust worthy guideline development as have other groups. Guidelines could differ by performing any of the steps along the development pathway differently from each other. A lot of judgment and decision-making goes into developing guidelines explaining why each of the steps could be performed differently.

What are some of the main reasons why different guidelines might have different recommendations?

  1. They attempt to provide guidance on different clinical questions. The guidelines just address different things even though they seem similar. Obviously you would choose the one that best fits the clinical scenario.
  2. Different evidence bases were used to make recommendations. It could be that one guideline is newer than another and contains more updated evidence. What is more problematic is when guidelines are released at about the same time but have differing evidence bases. As I mentioned earlier a lot of judgments and decisions are made during the guideline development process. An important one is which studies to include/exclude from the evidence review. This is more subjective than you might realize. It’s easy to develop study selection criteria to include only that evidence supporting your point of view while excluding that which doesn’t. Pick the guideline with the most comprehensive literature search. Also make sure the exclusion criteria make sense and don’t just help to support a biased point of view.
  3. Different outcomes were considered. Recommendations are made to improve care with the hope of improving some outcome.  It could be that one guideline focused on a surrogate marker (e.g. LDL levels) while another focused on hard clinical outcomes (MI and stroke event rates). Go with the one focused on hard clinical outcomes.
  4. Values, biases and conflicts of interest of the guideline developers is probably the main reason for disparate recommendations. Almost every guideline panel is biased in some way. The best you can hope for is that multiple biases and conflicts of interest are represented and essentially cancel each other out (by assembling a multidisciplinary panel to develop the guideline). Just disclosing conflicts of interest does nothing to lessen their impact. Moving from evidence to recommendations involves value judgments. The recommendations in the guideline are shaped by these values. Different groups will weigh the benefits and harms differently even if the same exact evidence base is reviewed. One need look no further than breast cancer screening guidelines to find very different values structures between cancer organizations and the USPSTF.  Unfortunately the values structure of the panel is usually not explicitly stated and one must infer it from the recommendations that are made.   You should choose the guideline that best matches the values of the patient.

I tried to give a very basic overview of why guidelines can differ. There are other reasons but these are the main ones. Clinicians should look for and use guidelines that are trustworthy. They should not follow recommendations uncritically but seek to understand what values and judgments shaped the recommendations.

Review of the 2013 ACC/AHA Cholesterol Treatment Guidelines

In noon conference today I reviewed the good, the bad, and the ugly of the recently released ACC/AHA cholesterol treatment guidelines. Below is a  YouTube video review of the guidelines. It will be interesting to see how cholesterol management evolves over the next few years. There are groups like the National Lipid Association who feel that removing the LDL goals from the new guideline was a mistake. Likewise, the European Society of Cardiology lipid guidelines recommend titrating statins to LDL targets.  Conflicting guidelines are always a problem. I will address conflicting guidelines in my next post and what to think about when you see conflicting recommendations on seemingly the same topic.

Reading Journal Article Abstracts Isn’t As Bad As I Thought

Physicians mainly read the abstract of a journal article (JAMA 1999;281:1129). I must admit I am guilty of this also. Furthermore, I would bet that the most often read section of the entire article is the conclusions of the abstract. We are such a soundbite society.

Quick facts

I had always thought the literature showed how bad abstracts were…that they were often misleading compared to the body of the article. But I was wrong. A recent study  published in BMJ EBM found that 53.3% are abstracts had a discrepancy compared to information in the body of the article. That sounds bad doesn’t it? But only 1 of them was clinically significant. Thus most of the discrepancies were not important enough to potentially cause patient harm or alter a clinical decision.

This is good news as effectively practicing EBM requires information at the point of care. Doctors don’t have time to read an entire article at the point of care for every question they have but they do have time to read an abstract. It’s good to know that structured abstracts (at least from the major journals that were reviewed in this study) can be relied upon for information. I especially like reading abstracts in evidence based journals like BMJ EBM or ACP Journal Club as even their titles give the clinical information you need.

Interactive Video- The Future of Video for Education and Beyond

I recently discovered a cool tool for teaching- TouchCast. A TouchCast is an interactive video; meaning there is a background video and things popup that can be touched and opened up.

TouchCast Logo

I made a Touchcast on case-control studies. Check it out and see what I mean. Make sure you touch one of the YouTube videos or the web site that I put on the screen to see how it works.

I find this very exciting. I can make a background video that gives a 30,000 foot view of a topic and embed further materials (other videos, websites, etc) for those that want a deeper understanding.

So what are the limitations? For now the interactivity is limited to viewing a TouchCast via their app or their website for the interactive functionality. The videos can be uploaded to YouTube but the interactivity is lost. The length of the video is also limited to 5 or 6 minutes. This isn’t a killer for me because educational videos should be short and in this case I can embed hours of other videos if I wanted to. Finally, the other limitiation (for now, will be changed soon) is that its an iPad tool. A desktop version is coming soon. Hopefully an Android app also.

TouchCast has really broken ground here. This should open up more advancements that will do even more. The future is exciting for us flipped classroom types.

I Am Not Using The New Risk Predictor In The Recently Released Cholesterol Guidelines

Last week the hotly anticipated cholesterol treatment guidelines were released and are an improvement over the previous ATPIII guidelines. The new guidelines abandon LDL targets, focus on statins and not add-on therapies which don’t help, and emphasize stroke prevention in addition to heart disease prevention.

The problem with the new guidelines is that they developed a new risk prediction tool  which frankly stinks. And the developers knew it stunk but promoted it anyway!

Lets take a step back and discuss clinical prediction rules (CPR). CPRs are mathematical models that quantify the individual contributions of elements of the history, PE, and basic laboratory tests into a score that aids diagnosis or prognosis estimation. They can accommodate more factors than the human brain can take into account and they always give the same result whereas human judgment is inconsistent (especially in the less clinically experienced). To develop a CPR you 1) construct a list of potential predictors of the outcome of interest, 2)examine a group of patients for the presence of the candidate predictors and their status on the outcome of interest, 3) determine statistically which predictors are powerfully and significantly associated with the outcome, and 4) validate the rule [ideally involves application of rule prospectively in a new population (with different spectrum of disease) by a variety of clinicians in a variety of institutions].

Back to the new risk tool. They decided to develop a new tool because the Framingham Score (previously used in the ATPIII guidelines) was insufficient (developed on exclusively white population). How was it developed? The tool was developed using “community-based cohorts of adults, with adjudicated endpoints for CHD death, nonfatal myocardial infarction, and fatal or nonfatal stroke. Cohorts that included African-American or White participants with at least 12 years of follow-up were included. Data from other race/ethnic groups were insufficient, precluding their inclusion in the final analyses”. The data they used was from “several large, racially and geographically diverse, modern NHLBI-sponsored cohort studies, including the ARIC study, Cardiovascular Health Study, and the CARDIA study, combined with applicable data from the Framingham Original and Offspring Study cohorts”. I think these were reasonable derivation cohorts to use. How did they validate the tool? Importantly they must use external testing because most models work in the cohort from which it was derived. They used “external cohorts consisting of Whites and African Americans from the Multi-Ethnic Study of Atherosclerosis (MESA) and the REasons for Geographic And Racial Differences in Stroke study (REGARDS). The MESA and REGARDS studies were approached for external validation due to their large size, contemporary nature, and comparability of end points. Both studies have less than 10 years of follow up. Validation using “most contemporary cohort” data also was conducted using ARIC visit 4, Framingham original cohort (cycle 22 or 23), and Framingham offspring cohort (cycles 5 or 6) data”. The results of their validity testing showed C statistics ranging from a low of 0.5564 (African -American men) to a high of 0.8182 (African-American women). The C statistic is a measure of discrimination (differentiating those with the outcome of interest from those without the outcome) and ranges from 0.5 (no discrimination- essentially as good as a coin flip) to 1.0 (perfect discrimination). The authors also found that it overpredicted events. See graph below.

graph of over prediction

So why don’t I want to use the new prediction tool? 3 main reasons:
1) It clearly over predicts outcomes. This would lead to more people being prescribed statins than likely need to be on statins (if you only use the tool to make this decision). One could argue that’s a good thing as statins are fairly low risk and lots of people die from heart disease so overtreating might be the way to err.
2) No study of statins used any prediction rules to enroll patients. They were enrolled based on LDL levels or comorbid diseases. Thus I don’t even need the rule to decide on whether or not to initiate a statin.
3) Its discrimination is not good….see the C-statistic results. For Black men its no better than a coin flip.

What Does Statistically Significant Mean?

Hilda Bastian writes an important and well written blog on this topic in a recent Scientific American blog .

I don’t think I have much else to add other than read this blog. There are some great links inside her blog to further understand this topic.

I think we are too focused on p <0.05. What if the p value is 0.051? Does that mean we should ignore the finding? Is it really any different than p value of 0.0499?

statistically significant

Confidence intervals give information on both statistical significance and clinical significance but I worry about how they are interpreted also. (Disclaimer: the interpretation and use of the confidence interval that follows is not statistically correct but is how we use them clinically.) Lets say a treatment improves a bad outcome with a relative risk (RR) of 0.94 with 95% CI of 0.66-1.12. So the treatment isn’t “statistically significant” (the CI includes 1.0) but there is potential for a relatively significant clinical benefit [ the lower bound of the CI suggests a potential 34% reduction in the bad outcome (1- RR = relative risk reduction so 1-0.66 = 0.34 or 34%)]. There is also potential for a clinically significant increase in risk of 12%. So which is more important? Somewhat depends on whether you believe in this treatment or not. If you believe in it you focus on the potential 34% reduction in outcomes. If you don’t believe in the treatment you focus on the 12% increased risk. So that’s the problem with confidence intervals but they give much more information than p-values do.

EBM Is In Jeopardy- Gamify A Lecture To Make It More Interesting

This week I did an EBM “lecture” based around the game show jeopardy. Now I know this isn’t anything new. Lots of teachers have used jeopardy format to teach. The point is that it took the content of “EBM Potpourri” (a group of topics that don’t fit well in other lectures that I give) and made it more interesting than a traditional 1 hour lecture (which is how I have given this lecture in the past).

The challenge when doing this to figure out your main teaching points and only include them since you don’t have a lot of extra space for less important topics (but shouldn’t we be doing this anyway?) The next challenge was to make gradually harder questions within each topic. I made some of the questions limited to certain learner levels only (I teach internal medicine residents that are organized into interns, 2nd years and 3rd years) to make sure every one participated independently at least somewhat. The residents only got about 40% of the questions right….but that wasn’t the point. The point was to convey my teaching points and to engage the learners. They worked in their teams (each team consisted of an intern, 2nd yr and 3rd yr) to solve problems. The competition between teams for “great prizes” (certificate of appreciation for 3rd place team, Rice-a-Roni to the 2nd place team, and lunch with me for the winners) made them take it a little more seriously.

If you would like the original PowerPoint file to use in your teaching I’ll be happy to email it to you. Contact me at UABEBM@gmail.com

What unique ways have you taught EBM topics?

Should Traditional Intention To Treat Analysis Be Abandoned?

A commenter on my video about intention to treat analysis  asked about my thoughts on a twist on intention to treat analysis in which an adjustment is made (via an instrumental variable) for “treatment contamination”. A disclaimer: I am not a statistician or epidemiologist.

First lets start with some definitions:
1) intention to treat analysis: once randomized always analyzed in the group to which the patient was assigned (even if you don’t get the intervention in the intervention arm or you do get it in the control arm)
2) Superiority trial: study designed to “prove” one intervention is better than the other. Null hypothesis is that there is no difference between the groups.
3) Noninferiority trial: study designed to “prove” that one intervention is not worse than another treatment by some prespecified amount. Null hypothesis is the is a difference between the groups.
4) Instrumental variable: variable associated with the factor under study but not directly associated with the outcome variable or any potential confounders.

intention to treat analysis

The authors of this paper An IV for the RCT: using instrumental variables to adjust for treatment contamination in randomised controlled trials  state:

Intention to treat analysis estimates the effect of recommending a treatment to study participants, not the effect of the treatment on those study participants who actually received it. In this article, we describe a simple yet rarely used analytical technique, the “contamination adjusted intention to treat analysis,” which complements the intention to treat approach by producing a better estimate of the benefits and harms of receiving a treatment. This method uses the statistical technique of instrumental variable analysis to address contamination

So what do I think about this?
1) A main role of intention to treat (ITT) analysis is to be conservative in a superiority trial. That means we dont want to reject the null hypothesis falsely and claim treatment is better than the control. Another main role of ITT analysis is to preserve randomization (remember, once randomized always analyzed).

2) The authors of the BMJ paper point out that “Intention to treat analysis estimates the effect of recommending a treatment to study participants, not the effect of the treatment on those study participants who actually received it.” This is true but isnt that what real life is like? I recommend a treatment to my patients. Some take it, some don’t. Some who I tell not to use something wind up using it.

3) The authors of the BMJ paper further point out that ITT analysis “underestimates value of receiving the treatment.” That is possible also but its also the point (see #1 above).

4) The instrumental variable in this scheme would be a variable entered into the model indicating whether or not a patient received treatment or not (no matter what group they were assigned to). ITT analysis would still be used but be adjusted for treatment receipt. I worry that this could lead to overfitting the model- a situation where you can add too many variables to a model and start to detect noise beyond real relationships.

5) I think it would be difficult in a trial to judge adherence- what is the cutoff? Is it 100%? What about 60%? 40%? How much use by the control group is important? I think there are issues in judging what is contamination or not.

Time will tell if this technique should be used. We will have to study the treatment estimates from traditional ITT analysis and contamination adjusted ITT analysis. Until then I will stick with what is recommended…traditional ITT analysis.