“The purpose of practice guidelines must be to develop the best possible recommendations from a body of evidence that may be contradictory or inadequate.”
While I agree that having recommendations come from an expert body is useful when there is inadequate or contradictory evidence I don’t think they should be labeled guidelines. A consensus statement is a more appropriate term. After all, if evidence is lacking or contradictory aren’t these experts just giving their opinion? Isn’t it possible that another group of experts would give a different opinion?
So don’t label it a guideline. That term has garnered reverence that was never intended. Guidelines become law almost. They are bastardized into punishing performance measures and become the cornerstone of legal argument. So, the term guideline should not be used lightly.
“…but those recommendations should always represent the best evidence and the best expert opinion currently available.”
NO! No expert opinion. Data is too open to interpretation. Humans filter information using prior knowledge, experience, and many heuristics (including, very importantly, the affect heuristic). A person’s specialty really influences how they interpret data. It’s one of the reasons it’s so important to have multidisciplinary panels so that conflicts and heuristics can be balanced. Unfortunately, most guideline panels are very homogeneous and conflicted.
I agree that we need unambiguous language in guidelines. They should only contain recommendations on things that have strong evidence that no one refutes. When they venture into the world of vagaries they become nothing more than opinion pieces.
First they use the IMPROVE IT study. In this study patients hospitalized for ACS were randomized to a combination of simvastatin (40 mg) and ezetimibe (10 mg) or simvastatin (40 mg) and placebo (simvastatin monotherapy). The LDLs were already pretty low in this group: baseline LDL cholesterol levels had to be between 50 to 100 mg per deciliter if they were receiving lipid-lowering therapy or 50 to 125 mg per deciliter if not on lipid lowering therapy (Average baseline LDL was 93.8 mg/dl). The results show minimal benefits as demonstrated below:
Current guidelines would recommend high potency statin in this patient population. Adding ezetimibe to moderate dose statin is probably equivalent to a high potency statin (from a LDL lowering perspective). This study (and all ezetimibe studies) should have tested the difference between simva 40-ezetimbe 10 and simva 80mg or atorvastatin 40 or 80mg. So to me IMPROVE IT doesn’t PROVE anything other than a more potent statin leads to less cardiovascular events…something we already know.
Now on to the 2nd argument. They argue that alirocumab (Praluent), the first in a new class, the proprotein convertase subtilisin/kexin type 9 (PCSK-9) inhibitors should lead to LDL guided therapy again. Why? “Early results suggest these drugs have a powerful effect on levels of low-density lipoprotein cholesterol (LDL-C), likely more potent than statins“. A systematic review of studies of this drug shows a mortality reduction but the comparators in these studies was placebo or ezetimibe 10mg. Why? We have proven therapy for LDL and this drug should have been compared to high potency statins. That study will likely not ever be done (unless the FDA demands it) because the companies making this drug cant risk finding that it works only as good as a high potency statin or possibly worse. Also does this class of drugs have anti-inflammatory effects like statins? Are they safer? This is an injectable drug that has to be warmed to room temperature prior to use and is very costly compared to generic atorvastatin.
In my opinion, no guideline should be changed without appropriately designed outcomes studies for the drugs being recommended. In this case, the risk-benefit margin needs to be impressive to justify the cost as we have dirt cheap potent statins already.
The authors of this viewpoint make no great rational argument for guidelines change other than that there is a new drug on the market and it might work. Lets see if it does and at what cost (both monetary and physiological).
This week JAMA Internal Medicine published a research letter reporting data on the underrepresentation of women, elderly patients, and racial minorities in RCTs used to inform cardiovascular guidelines. The authors state that RCTs are considered to be the highest level of evidence that should be used to inform guideline development. I would argue systematic reviews would even be better but I understand that questions to be addressed in guidelines often need individual RCTs to answer them. They then state that “RCTs can have limited external validity”. What do you think?
The authors evaluated all references and then focused on RCTs that were cited in the ACC/AHA guidelines on atrial fibrillation, heart failure, and acute coronary syndromes. They extracted data on age, gender, ethnicity, and continents from which subjects were recruited. What did they find?
Female representation was highest in RCTs in atrial fibrillation (33%) followed by ACS (29%) and heart failure (29%). The next question you should ask is how does this compare to the actual gender representations of people affected by these diseases? In US registries of atrial fibrillation women make up 55% of patients, 42% in ACS registries, and 47% in heart failure registries. Thus women are underrepresented by up to 22% in these studies but does this affect guideline recommendations? Another way to think about this is would more data change recommendations for women? Hard to know for sure but I suspect not. If enrollment is properly conducted I would think that those enrolled would be a sample of all women with atrial fibrillation, ACS and heart failure. Even though the sampling fraction is smaller as long as they are representative of all women with those problems there should be no bias. The statistical inferences could be affected due to small sample sizes though but the overall qualitative findings (ie benefit or harm) should not be affected.
As expected the majority of patients enrolled in these studies were white. Black patients constituted 19% of heart failure RCT patients and 6% of both afib and ACS patients. In US registries of heart failure, afib and ACS black patients make up 6%, 21%, and 11% respectively. Again I don’t have a problem with this if sampling was done properly.
Elderly (defined as those >75 yrs of age) are very underrepresented constituting only 2% of patients in all the RCTs combined. In this case guideline developers will have to rely on observational data or use expert opinion to inform recommendations.
Finally, the authors point out that 94% of enrolled patients came from North America or Europe. Is this a problem? I don’t think so for the US as ACC/AHA guidelines are developed to guide treatment of American patients. Patients from other underrepresented continents will have less direct evidence informing recommendations on their care. Consequently, those recommendations will be based more on expert opinion.
In this installation I want to jump ahead in Greenhalgh’s paper to address her last cause of the EBM crisis: “Poor fit for multimorbidity“. Not to worry, I will come back in a future post to cover the remaining “problems” of EBM.
I concur with Greenhalgh that individual studies have limited applicability by themselves in a vacuum to patients with multimorbidity. Guidelines don’t help a they also tend to be single disease focused and developed by single disease -ologists. So is EBM at fault here again? Of course not. EBM skills to the rescue.
The current model of EBM demonstrated below contains 2 important elements: clinical state and circumstances and clinical experience.
Clinical state and circumstances largely refers to the patient’s comorbidities, various other treatments they are receiving, and the clinical setting in which the patient is being seen. Thus, the EBM paradigm is specifically designed to deal with multimorbidity. Clinical expertise is used to discern what impact other comorbidities have on the current clinical question under consideration. and, along with the clinical state/circumstance, helps us decide how to apply a narrowly focused study or guideline in a multimorbid patient. Is this ideal? No. It would be nice if we had studies that included patients with multiple common diseases but we have to treat patients with the best available evidence that we have.
Greenhalgh and colleagues report that the “second aspect of evidence based medicine’s crisis… is the sheer volume of evidence available”. EBM is not the purveyor of what is studied and published. EBM is a set of skills to effectively locate, evaluate, and apply the best available evidence. For much of what we do there is actually a paucity of research data answering clinically relevant questions (despite there being alot of studies- which gets back to her first complaint about distortion of the evidence brand. See part 1 of this series). I teach my students and housestaff to follow the Haynes’ 6S hierarchy when trying to answer clinical questions. As much of the hierarchy is preappraised literature someone else has had to deal with the “sheer volume of evidence”. Many clinical questions can be answered at the top of the pyramid.
I concur with Greenhalgh that guidelines are out of control. I have written on this previously. We don’t need multiple guidelines on the same topic, often with conflicting recommendations. I believe that we would be better off with central control of guideline development under the auspices of an agency like AHRQ or the Institute of Medicine. It would be much easier to produce trustworthy guidelines and guidelines on topics for which we truly need guidance. (Really American Academy of Otolaryngology….do we need a guideline on ear wax removal?) It can be done. AHCPR previously made great guidelines on important topics. Unfortunately we will probably never go back to the good ole days. Guidelines are big business now, with specialty societies staking out their territory and government and companies bastardizing them into myriad performance measures.
2 papers got published this week to further validate the pooled risk equations developed for the ACC/AHA Cholesterol Guidelines. Muntner and colleagues used the REGARDS participants to assess the calibration and discrimination of the pooled risk equations. This study had potential as it oversampled patients from the stroke belt. This is important because the Pooled Risk Equations were developed to overcome the limitations of the Framingham tool (mainly its lack of minorities). I have a real problem with this study because the pooled risk equations estimate 10 yr risk of CHD and stroke and this study only has 5 yrs of follow-up for the REGARDS participants. I don’t think their estimates of calibration and discrimination are valid. Risk of CHD and stroke should increase over time so event rates could change with 5 more years of follow-up. The important thing this paper adds is the reminder that observational studies often lack active surveillance. Most observational studies rely on self report of outcomes and obviously silent events would be missed by the patient as would events for which the patient didn’t seek evaluation. Muntner and colleagues also used Medicare claims data to identify events not detected through routine cohort follow-up and found 24% more events. This is a useful lesson from this study.
In a more useful study Kavousi and colleagues compared 3 risk prediction tools (pooled risk equations, Framingham, and SCORE) using the Rotterdam Study, a prospective population-based cohort of persons aged 55 yrs and older. This cohort does have 10 yrs of follow-up.
This figure shows that at each level of risk the pooled risk equations overestimated risk, though less so in women.
This figure shows the proportion of patients for whom treatment is recommended (red bars), treatment should be considered (yellow bars), and no treatment is recommended (green bars). As you can see the new risk tool leads to the large majority of men “needing treatment” compared to previous guidelines (ATP III) and the current European guidelines (ESC).
Finally, this figure shows the calibration curves and the calibration was not good. The blue dots should lie right upon the red line for good calibration. Furthermore, the c-statistic is 0.67 (a measure of discrimination which means the tool can differentiate diseased from nondiseased patients. A c-statitic above 0.7 is considered moderate to good. The closer to 1 the better).
Why might the pooled risk equations overestimate risk? Maybe they don’t if you believe the Muntner study. It could just be a problem with the lack of active surveillance in the cohort studies used to validate the tool. Or they really do overestimate risk because they aren’t accurate or maybe more contemporary patients receive better therapies that improve overall health or maybe the baseline risk characteristics of the validation cohorts just differ too much from the development cohorts.
I am still not sold on the new pooled risk equations but they might not be much better than what we have been using based on the Kavousi study (Framinham also overpredicted risk and had poor calibration). I think we need more study and tweaking of the tool or we use the tool as is and focus more on cardiovascular risk reduction (with exercise, diet, tobacco cessation, diabetes and HTN control) and don’t focus so much on starting a statin right away.
I gave a CME seminar this week on treating hypertension in the elderly and after my presentation a clinical pharmacist asked me an interesting question: “What do you follow? JNC 7 or JNC 8?”.
I thought this was an interesting question and one I hadn’t thought about at all. After all shouldn’t an updated guideline trump the previous one? I like JNC 8 because its methodology is more explicit and consistent with IOM principles than JNC 7. One can argue with some of the decisions made about the evidence review (ie that they only included RCTs and ignored systematic reviews and observational data) and be concerned about the degree of conflicts of interest of the panel members. But what JCN 8 did was make life simpler in that the BP goals are easily remembered: <150/90 for those over 60 yrs of age and < 140/90 for everyone else including those with diabetes or CKD (regardless of age). So for these reasons I prefer JNC 8. Is it perfect? No but I suspect they will address many of the concerns critics have expressed and further questions that need to be addressed in future updates (that they promise will come in a timely fashion).
It is common for multiple guidelines to be made by different developers on the same topic. Problems arise though when different guidelines make differing recommendations. Which one should you believe?
The guideline development process is complex. The Institute of Medicine published a framework for trust worthy guideline development as have other groups. Guidelines could differ by performing any of the steps along the development pathway differently from each other. A lot of judgment and decision-making goes into developing guidelines explaining why each of the steps could be performed differently.
What are some of the main reasons why different guidelines might have different recommendations?
They attempt to provide guidance on different clinical questions. The guidelines just address different things even though they seem similar. Obviously you would choose the one that best fits the clinical scenario.
Different evidence bases were used to make recommendations. It could be that one guideline is newer than another and contains more updated evidence. What is more problematic is when guidelines are released at about the same time but have differing evidence bases. As I mentioned earlier a lot of judgments and decisions are made during the guideline development process. An important one is which studies to include/exclude from the evidence review. This is more subjective than you might realize. It’s easy to develop study selection criteria to include only that evidence supporting your point of view while excluding that which doesn’t. Pick the guideline with the most comprehensive literature search. Also make sure the exclusion criteria make sense and don’t just help to support a biased point of view.
Different outcomes were considered. Recommendations are made to improve care with the hope of improving some outcome. It could be that one guideline focused on a surrogate marker (e.g. LDL levels) while another focused on hard clinical outcomes (MI and stroke event rates). Go with the one focused on hard clinical outcomes.
Values, biases and conflicts of interest of the guideline developers is probably the main reason for disparate recommendations. Almost every guideline panel is biased in some way. The best you can hope for is that multiple biases and conflicts of interest are represented and essentially cancel each other out (by assembling a multidisciplinary panel to develop the guideline). Just disclosing conflicts of interest does nothing to lessen their impact. Moving from evidence to recommendations involves value judgments. The recommendations in the guideline are shaped by these values. Different groups will weigh the benefits and harms differently even if the same exact evidence base is reviewed. One need look no further than breast cancer screening guidelines to find very different values structures between cancer organizations and the USPSTF. Unfortunately the values structure of the panel is usually not explicitly stated and one must infer it from the recommendations that are made. You should choose the guideline that best matches the values of the patient.
I tried to give a very basic overview of why guidelines can differ. There are other reasons but these are the main ones. Clinicians should look for and use guidelines that are trustworthy. They should not follow recommendations uncritically but seek to understand what values and judgments shaped the recommendations.
In noon conference today I reviewed the good, the bad, and the ugly of the recently released ACC/AHA cholesterol treatment guidelines. Below is a YouTube video review of the guidelines. It will be interesting to see how cholesterol management evolves over the next few years. There are groups like the National Lipid Association who feel that removing the LDL goals from the new guideline was a mistake. Likewise, the European Society of Cardiology lipid guidelines recommend titrating statins to LDL targets. Conflicting guidelines are always a problem. I will address conflicting guidelines in my next post and what to think about when you see conflicting recommendations on seemingly the same topic.
Last week the hotly anticipated cholesterol treatment guidelines were released and are an improvement over the previous ATPIII guidelines. The new guidelines abandon LDL targets, focus on statins and not add-on therapies which don’t help, and emphasize stroke prevention in addition to heart disease prevention.
The problem with the new guidelines is that they developed a new risk prediction tool which frankly stinks. And the developers knew it stunk but promoted it anyway!
Lets take a step back and discuss clinical prediction rules (CPR). CPRs are mathematical models that quantify the individual contributions of elements of the history, PE, and basic laboratory tests into a score that aids diagnosis or prognosis estimation. They can accommodate more factors than the human brain can take into account and they always give the same result whereas human judgment is inconsistent (especially in the less clinically experienced). To develop a CPR you 1) construct a list of potential predictors of the outcome of interest, 2)examine a group of patients for the presence of the candidate predictors and their status on the outcome of interest, 3) determine statistically which predictors are powerfully and significantly associated with the outcome, and 4) validate the rule [ideally involves application of rule prospectively in a new population (with different spectrum of disease) by a variety of clinicians in a variety of institutions].
Back to the new risk tool. They decided to develop a new tool because the Framingham Score (previously used in the ATPIII guidelines) was insufficient (developed on exclusively white population). How was it developed? The tool was developed using “community-based cohorts of adults, with adjudicated endpoints for CHD death, nonfatal myocardial infarction, and fatal or nonfatal stroke. Cohorts that included African-American or White participants with at least 12 years of follow-up were included. Data from other race/ethnic groups were insufficient, precluding their inclusion in the final analyses”. The data they used was from “several large, racially and geographically diverse, modern NHLBI-sponsored cohort studies, including the ARIC study, Cardiovascular Health Study, and the CARDIA study, combined with applicable data from the Framingham Original and Offspring Study cohorts”. I think these were reasonable derivation cohorts to use. How did they validate the tool? Importantly they must use external testing because most models work in the cohort from which it was derived. They used “external cohorts consisting of Whites and African Americans from the Multi-Ethnic Study of Atherosclerosis (MESA) and the REasons for Geographic And Racial Differences in Stroke study (REGARDS). The MESA and REGARDS studies were approached for external validation due to their large size, contemporary nature, and comparability of end points. Both studies have less than 10 years of follow up. Validation using “most contemporary cohort” data also was conducted using ARIC visit 4, Framingham original cohort (cycle 22 or 23), and Framingham offspring cohort (cycles 5 or 6) data”. The results of their validity testing showed C statistics ranging from a low of 0.5564 (African -American men) to a high of 0.8182 (African-American women). The C statistic is a measure of discrimination (differentiating those with the outcome of interest from those without the outcome) and ranges from 0.5 (no discrimination- essentially as good as a coin flip) to 1.0 (perfect discrimination). The authors also found that it overpredicted events. See graph below.
So why don’t I want to use the new prediction tool? 3 main reasons:
1) It clearly over predicts outcomes. This would lead to more people being prescribed statins than likely need to be on statins (if you only use the tool to make this decision). One could argue that’s a good thing as statins are fairly low risk and lots of people die from heart disease so overtreating might be the way to err.
2) No study of statins used any prediction rules to enroll patients. They were enrolled based on LDL levels or comorbid diseases. Thus I don’t even need the rule to decide on whether or not to initiate a statin.
3) Its discrimination is not good….see the C-statistic results. For Black men its no better than a coin flip.
Interested in teaching and learning principles and ideas of community involvement, sustainability, equity, technology and engagement. Looking at new ways to innovate and to hack the curriculum. All views my own.