Skip Navigation

This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Heydtmann, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Heydtmann, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Q J Med 2002; 95: 247-249
© 2002 Association of Physicians


Commentary

The nature of truth: Simpson's Paradox and the limits of statistical data

M. Heydtmann

From the Liver Research Laboratories, Queen Elizabeth Hospital, Birmingham, UK


    Introduction
 Top
 Introduction
 An example
 Is Simpson's Paradox dependent...
 Does a properly designed...
 Is Simpson's Paradox common?
 References
 
‘Give me a fruitful error any time, full of seeds, bursting with its own corrections. You can keep your sterile truth for yourself.’ Vilfredo Pareto

We usually think in terms of true and false, and often believe that we know which is which. Nonetheless, sometimes information which appears to be true is in fact false. Although we try to base our medical knowledge on objective evidence—research and statistics—rather than our personal opinions, Simpson's Paradox reminds us of the limitations of statistical evidence. In this phenomenon, an apparent paradox arises because aggregated data can support a conclusion which is opposite from that suggested by the same data before aggregation.


    An example
 Top
 Introduction
 An example
 Is Simpson's Paradox dependent...
 Does a properly designed...
 Is Simpson's Paradox common?
 References
 
One would generally conclude from the data in Table 1Go that treatment A is the treatment of choice for the condition studied (given that side-effects are equal). Suppose, however, that these patients consisted of two subgroups: those with a high serum level of substance X, and those with a low level. Table 2Go shows the data for the patients with high serum X. For this subgroup of patients, treatment B seems to be better than treatment A. Since A is the preferable treatment in the group as a whole, one might intuitively expect the other patients to be better off with treatment A. But this is not the case (Table 3Go). Even in patients with low serum X, treatment B is still better (although fewer of these patients benefit from either treatment).


View this table:
[in this window]
[in a new window]
 
Table 1  Number of patients responding to treatment A vs. treatment B: A is better than B

 

View this table:
[in this window]
[in a new window]
 
Table 2  Number of patients with high serum X responding to treatment A vs. treatment B: in this subgroup, B is better than A

 

View this table:
[in this window]
[in a new window]
 
Table 3  Number of patients with low serum X responding to treatment A vs. treatment B: in this subgroup too, B is better than A

 
Thus, if the patient's serum X level is unknown, treatment A seems to be better, but if serum X is known, treatment B is preferable (and one can better predict the response rate of a patient). This phenomenon is a result of the aggregation of two (or more) subgroups.1 The numbers of the example are kept simple to demonstrate this phenomenon of severe confounding, but there are a number of real examples in the literature, including the medical literature.2–4. This aggregation effect can occur in the case of an uneven distribution of a ‘latent variable’ (in this case the serum X level) among the groups studied.

Clearly, if available, one should consider the data for the subgroups, because they give you the most relevant information for a given patient, and in a trial one would have to report the data for the subgroups. The danger lies in a case where the aggregation data alone are available, but the detailed analysis would recommend a different conclusion. One could call this ‘type S error’, after Simpson its discoverer.


    Is Simpson's Paradox dependent on the absolute numbers, and does statistical significance protect from the effect?
 Top
 Introduction
 An example
 Is Simpson's Paradox dependent...
 Does a properly designed...
 Is Simpson's Paradox common?
 References
 
In the example, the benefit of treatment B over treatment A for the subgroups is not statistically significant: Although the p value for the aggregation data is 0.04, the p value for benefit of treatment B over treatment A is 0.3 in both subgroups. But if a zero is added to the numbers of patients in all three tables (Tables 4GoGo–6Go), the benefit of treatment B over A becomes statistically significant in both the aggregate group and the subgroups.


View this table:
[in this window]
[in a new window]
 
Table 4  Table 1Go with increased numbers (p<0.005)

 

View this table:
[in this window]
[in a new window]
 
Table 5  Table 2Go with increased numbers (p<0.05)

 

View this table:
[in this window]
[in a new window]
 
Table 6  Table 3Go with increased numbers (p<0.05)

 
Thus the aggregation effect is not dependent on absolute numbers, and Simpson's Paradox can occur in cases with statistical significance. When studies with low numbers of patients in different subgroups are combined and data become aggregated, Simpson's Paradox can arise: an important issue in meta-analyses.

In clinical practice, the aggregation data become irrelevant as soon as one performs or is even aware of the more detailed analysis. One would then always favour treatment B, in the example. This is true even if one could not measure the serum X level in a patient, because he or she would always fall in one of the two subgroups. Although in Tables 1GoGo to 3Go the statistical basis for preferring treatment B is weak, it would be wrong to favour treatment A simply because the benefit in Table 1Go was statistically significant.


    Does a properly designed trial prevent Simpson's Paradox?
 Top
 Introduction
 An example
 Is Simpson's Paradox dependent...
 Does a properly designed...
 Is Simpson's Paradox common?
 References
 
The aggregation effect shown above is dependent on the uneven distribution of subgroups of patients into the two treatment groups. Naturally, one tries to avoid an uneven distribution of variables. An investigator controls for the known variables, and minimizes the unknown by randomization. In the case above, randomization of 30 out of 40 patients with a latent variable to treatment A and only 10 to treatment B (Tables 1GoGo–3Go) does not seem very likely. Increasing the numbers of patients helps: the randomization of 300 out of 400 patients to one group and only 100 to the other (Tables 4GoGo–6Go) is even more unlikely. But high numbers do not absolutely prevent such an uneven distribution and the possibility of ‘type S error’. Considering the astronomical numbers of latent variables that are not and will never be controlled for, there is a good chance that one of them is unevenly distributed. Simpson's aggregation effect could then lead to a false conclusion.


    Is Simpson's Paradox common?
 Top
 Introduction
 An example
 Is Simpson's Paradox dependent...
 Does a properly designed...
 Is Simpson's Paradox common?
 References
 
The paradox and its associated ‘type S error’ have been described in both medical and non-medical studies.2–4 In a paper by Charing et al. on comparison of success rates of kidney stone removal with different techniques, percutaneous nephrolithotomy had a better overall outcome than open surgery (83% success rate vs. 78%). But when the patients were divided into a group with a single stone <2 cm and a group with one larger stone or multiple stones, success rates were better in open surgery for both groups: Open surgery had a 93% success rate vs. 87% for percutaneous nephrolithotomy in the group with a single small stone. In the second group, there was 73% success for surgery vs. 69% for percutaneous nephrolithotomy. Here the aggregation effect occurs because most patients with a single small stone (234/289) were treated percutaneously, whereas the majority of those with multiple or large stones (192/273) were treated with open procedures.2

Similarly, Early and Nicholas demonstrated a fall in the percentage of male patients in a psychiatric hospital between 1970 and 1975, but breaking down the results according to the patients age (age >65 and age <65) there was an increase in male patients in both age groups. In this study the effect was caused by a predominance of younger males and older females in the hospital, and a marked decrease in the overall number of hospitalized patients during the time studied.3 Reintjes et al. describes another recent medical example in a multi-centre study on nosocomial infections.4

Even though these represent only a minority of published statistics, the error can well occur without us noticing. The more one analysed studies in detail for latent variables, the more likely one would be to find more examples. Further, the more we question our currently accepted knowledge in this way, the more uncertain our evidence becomes. If we analyse the data and find a Simpson's Paradox, this does not protect us from a second ‘type S error’ resulting from a further aggregation effect. Returning to Table 6Go, where treatment B was preferable to treatment A, it could still be the other way round if we analysed the data according to further latent variables (Tables 7Go and 8Go). Since finding severe confounding does not protect from the ‘type S error’, can we believe in the results of statistics at all?


View this table:
[in this window]
[in a new window]
 
Table 7  Subgroup of patients from Table 6Go (low serum X) who have high serum levels of Y, and their responses to treatment A vs. treatment B

 

View this table:
[in this window]
[in a new window]
 
Table 8  Subgroup of patients from Table 6Go (low serum X) who have low serum levels of Y, and their responses to treatment A and B

 
All results need to be viewed in their scientific context. Where the results of one study diverge markedly from those of other studies, care is needed, and latent variables which might have been overlooked have to be considered. The possibility arises that the results were confounded by factors that were not controlled for or perhaps not even measured. Statistical analysis of data can lead to ‘statistical illusions’, which like optical illusions, cause misinterpretation. Where the results of several studies are similar and make sense in the context of all the other available evidence, they can probably be relied on. But are they true? And what is truth? The answer of a statistician would be: a statement is generally regarded as ‘true’ if it is true with a probability close to 1, even if this probability will never reach 1. Can we avoid that what we believe to be true is actually false? No, only someone with very limited knowledge would be able to state that everything he or she knows is true. The more knowledge a person accumulates, the more likely it is that some of his or her knowledge is actually false.


    Acknowledgments
 
I would like to thank Professor James Neuberger and Dr Carl Rasmussen for their helpful discussions and review of the manuscript.


    Notes
 
Address correspondence to Dr M. Heydtmann, Liver Research Laboratories, Queen Elizabeth Hospital, Edgbaston B15 2TH. e-mail: m.heydtmann{at}bham.ac.uk Back


    References
 Top
 Introduction
 An example
 Is Simpson's Paradox dependent...
 Does a properly designed...
 Is Simpson's Paradox common?
 References
 
1. Simpson EH. The interpretation of interaction in contingency tables. J R Statist Soc B1951; 2:238–41.

2. Charig CR, Webb DR, Payne SR, Wickham OE. Comparison of treatment of renal calculi by operative surgery, percutaneous nephrolithotomy, and extracorporeal shock wave lithotripsy. Br Med J1986; 292:879–82.

3. Early DF, Nicholas M. ‘Dissolution of the mental hospital’: fifteen years on. Br J Psychiat1977; 130:117–22.[Abstract/Free Full Text]

4. Reintjes R, de Boer A, van Pelt W, Mintjes-de Groot J. Simpson's paradox: an example from hospital epidemiology. Epidemiology2000; 11:81–3.[Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Extract Freely available
Right arrow FREE Full Text (PDF) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Right arrow Disclaimer
Google Scholar
Right arrow Articles by Heydtmann, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Heydtmann, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?