Ann Thorac Surg 2008;86:348. doi:10.1016/j.athoracsur.2007.10.028
© 2008 The Society of Thoracic Surgeons
Correspondence
In the Context of Performance Monitoring, the Caterpillar Plot Should Be Mothballed in Favor of the Funnel Plot
Mohammed A. Mohammed, PhD,
Jonathan J. Deeks, PhD
Department of Public Health & Epidemiology, University of Birmingham, Edgbaston, Birmingham B15 2TT, United Kingdom
(Email: m.a.mohammed{at}bham.ac.uk).
To the Editor:
Grunkemeier & Wu's [1] recent contribution to the Statistician's page included a plot of risk-adjusted mortality in six hospitals after percutaneous coronary intervention. Although the authors did not explicitly focus on the plot, we believe the plot merits comment (Fig 1).

View larger version (8K):
[in this window]
[in a new window]
|
Fig 1. For each hospital the ratio of observed/expected deaths is shown as a dot, with vertical bars representing exact 95% confidence intervals based on the Poisson distribution. (Data from Reference 1.)
|
|
The purpose of this plot is to identify where differences in performance between hospitals may be explicable by chance. Although this type of plot, colloquially known as the caterpillar plot, is widely used in performance monitoring, there are some important conceptual issues that make the use of such plots problematic:- 1 A separate interval of uncertainty is drawn for each hospital. If any interval excludes the null (ie, risk ratio = 1), then this suggests evidence against the null hypothesis: that all differences between the units are explicable by chance. Therefore, six significance tests are undertaken to address a single underlying hypothesis. Thus, there is misalignment between the plot and the hypothesis of interest.
- 2 The plot does not address multiple-testing, rather it encourages six comparisons of hospitals with the expected rate (and 15 comparisons between hospitals). As more hospitals are included, it is expected that one or more units will have a 95% confidence interval that does not include the null simply due to chance.
- 3 The plot encourages an erroneous interpretation that if the confidence intervals of one unit overlap with those of another, then those units are similar to each other [2].
- 4 The sample sizes are not directly shown (although the relative sizes of the confidence intervals give some indication of difference in sample size).
- 5 Ordering the plot from low to high mortality encourages spurious ranking of hospitals, which has been shown to be unreliable, as it mainly reflects the play of chance and not real differences [3].
Now consider the plot, known as the funnel plot [4], shown as follows. Rather than indicating uncertainty for each observed hospital rate, in this plot a single envelope of uncertainty is drawn from the expected line. Each hospital's performance is plotted against its effective sample size (ie, for a Poisson distribution, this is the number of observed events). Although this plot does not directly adjust for multiple comparisons, nor facilitate hypothesis testing, it is easier to judge whether the distribution of points within and outside different percentile lines is as expected, or whether they indicate unexpectedly high or low performance.
Such an approach to intervals has been long standing in statistical process control, based on Shewhart's theory of variation [3], and has several additional advantages which include: (1) avoiding spurious rankings, (2) displaying the volume-outcome relationship, (3) demonstrating the increased variability associated with lower samples (smaller hospitals) albeit via the expected mortality, and (4) showing more than one interval of uncertainty (95% and 99% limits), and yet still being consistent with the null hypothesis. This plot can produce materially different interpretations to that previously described (Fig 2) [5].

View larger version (9K):
[in this window]
[in a new window]
|
Fig 2. Inner dotted lines are 95% intervals and outer dotted lines are 99.9% intervals derived using the Poisson distribution. We would prefer to plot the y-axis on the loge scale but opted for the natural scale to provide consistency with Grunkemeier & Wu [1]. The letters A–F represent the values for each hospital A–F respectively.
|
|
The plot has been fully described elsewhere [4]. We therefore urge that in the context of performance monitoring, the caterpillar plot should be mothballed in favor of the funnel plot.
 |
References
|
|---|
- Grunkemeier GL, Wu Y. What are the odds?. The statistician's page. Ann Thorac Surg 2007;83:1934-1939.[Free Full Text]
- Wolfe R, Hanley J. If we're so different, why do we keep overlapping?. When 1 plus 1 doesn't make 2. CMAJ 2002;166:65-661.[Free Full Text]
- Mohammed MA, Cheng KK, Rouse A, Marshall T. Bristol, Shipman and Clinical Governance: Shewhart's forgotten lessons Lancet 2001;357:463-467.[Medline]
- Spiegelhalter D. Funnel plots for comparing institutional performance Statist Med 2005;24:1185-1202.
- Adab P, Rouse AM, Mohammed MA, Marshall T. Performance league tables: the NHS deserves better BMJ 2002;324:95-98.[Free Full Text]
Related Article
-
Reply
- Gary L. Grunkemeier and YingXing Wu
Ann. Thorac. Surg. 2008 86: 349.
[Extract]
[Full Text]
[PDF]
This article has been cited by other articles:

|
 |

|
 |
 
G. L. Grunkemeier and Y. Wu
Reply
Ann. Thorac. Surg.,
July 1, 2008;
86(1):
349 - 349.
[Full Text]
[PDF]
|
 |
|