We read with great interest the recent article by Dr Cahan and colleagues regarding the inability of residents to accurately determine probability.1 Despite the recent surge in teaching Evidence Based Medicine (EBM) in medical schools, the effectiveness of current teaching strategies remains unclear. We sought to determine how readily medical students and physicians identify the diagnostic terms often stressed in EBM.
Relevant articles were identified by searching various database including Medline (1980–2003), Embase (1988–2003), PsychInfo (1984–2003), Web of Science (1993–2003), educational websites, and bibliography of relevant articles. Study design, quality of study, and limitations of study were abstracted by two independent reviewers. Review articles, letters of editors, editorials of innumeracy, and diagnostic tests were excluded.
We identified eight articles (5 case scenarios, 2 questionnaires, and 1 telephone survey) that met the inclusion criteria (Table 1).2–,9 The number of participants in the studies varied from 31 to 300. There was considerable heterogeneity in the various studies. The commonest physician error was in overestimating the PPV (78–95%). One study described that the number of physicians using Bayesian calculations, ROC and LR was 3%, 1% and 0.66%, respectively. Medical students could not rule out diseases in low and intermediate probability case scenarios applying Bayesian estimates. In one study from Australia, 13 of 50 (26%) physicians stated that they could describe PPV, although on direct interviewing only one could actually illustrate it with an example. In another study, presenting the data in Natural frequency format increased the accuracy of determining PPV to 46%.
Survey of 300 physicians– frequency of using quantitative diagnostic methods, sensitivity, specificity, ROC, LR, Bayesian logic
Eight (3%) used Bayesian, three (1%) used ROC, two (0.66%) used LR. Non-familiarity with LR and ROC (97%), Bayesian 76%
Steurer J, (2002)3
Swiss GPs (n = 263) were surveyed on definition of terms sensitivity, PPV determined and also calculated PPV. Test accuracy in clinical vignette, when tests were presented as test only, test + (sens. and spec.), test + (description of LR in plain language)
Correct definition of sensitivity 76%, PPV 61%. PPV was calculated accurately only by 22% of GPs. PPV best estimated when results of LR of test presented in plain language
Questionnaire and survey
Young JM (2002)4
Australian GPs (n = 50) were surveyed to describe the terms PPV, sensitivity, and specificity, followed by a direct interview by study author
13/50 said they knew about PPV but only one met the criteria for identifying it correctly
Hoffrage U, Gigerenzer G (1998)5
German physicians (n = 48) were asked to calculate the PPV of four diagnostic tests. Data were presented as probabilities or as natural frequencies.
Overall correct answers: Bayesian format 10%; Natural frequency format 46%
Generic case scenarios
Lyman GH (1994)6
Physicians (n = 31) and health care workers (n = 19) were presented with cases where the sensitivity, specificity and pre-test probability were varied, and asked to calculate PPV
Overestimating PPV in scenarios presented with lower pre-test probability. Non physicians estimates of PPV in cases with negative tests were inconsistent
Lyman GH (1993)7
Physicians (n = 31) and health care workers (n = 19) were presented with two hypothetical cases of a 30-year-old and a 70-year-old woman with a breast lump. Estimate pre-test, post-test, sensitivity, specificity
Physicians and non-physicians both overestimate the PPV
Noguchi Y (2002)8
Japanese medical students (n = 234). Three case scenarios with low, intermediate, high probability for CAD. Estimates of pre-, post-test characteristics of stress tests were elicited from students (intuitive estimates), and from literature (reference estimates)
Medical students could not rule out disease in low and intermediate probability situations, because of error in estimating the pre-test diagnosis and applying Bayesian estimates in clinical practice. May result in ordering unnecessary testing
Eddy DM (1982)9
Physicians (n = 100) asked to calculate PPV given positive mammogram
95/100 estimated an incorrect probability of 75%, which was 10 times the correct frequency
Despite the heterogeneity in the various studies, the results are generalizable as they have been carried out in four continents and yield similar results. Physician innumeracy remains an impediment in popularizing EBM. Inattention to pre-test probability, and inability to assess the PPV accurately, could result in increased anxiety in patients by generating unnecessary tests and consultations. Increased attention to EBM instructions and presentation of data in alternative formats (e.g. natural frequency) may be indicated. The limitations of our analysis include the small number of studies, their sometimes small number of subjects, and the variation in study design.
Eddy DM. Probabilistic reasoning in clinical medicine: problems and opportunities. In: Kahneman D, Sloviv, Tversky A, eds. Judgement under uncertainty: Heuristics and Biases. Cambridge UK, Cambridge University Press, 1982:249–67.