|
|
||||||||
Ann Thorac Surg 2003;76:663-667
© 2003 The Society of Thoracic Surgeons
a Providence Health System, Portland, Oregon, USA
* Address reprint requests to Dr Grunkemeier, 9205 SW Barnes, #33, Portland, OR 97225, USA.
e-mail: gary.grunkemeier{at}providence.org
The report by Novick and colleagues [1], in this issue of The Annals of Thoracic Surgery uses a cumulative sum (CUSUM) technique to assess the learning curve in telerobotic surgery. Novick and associates [2] have previously used CUSUMs to describe the learning curve of an academic surgeon, the change from on-pump to off-pump coronary bypass surgery [3], and the learning curve for off-pump surgery [4]. From these studies, they found that CUSUM provided " ... a more sensitive indicator of a cluster of surgical failures than standard statistical techniques" [1].
Background
CUSUM analysis was introduced 50 years ago in the United Kingdom (UK) using the terminology of industrial quality control [5], and was first used to monitor surgical performance 10 years ago [6]. Since then, several authors from the UK have extended the theory to accommodate the varying risk of cardiac surgery mortality [713]
Constant risk of failure
The original idea, as used by Novick and coworkers [1], is to plot the cumulative sum of "adjusted" failures by patient number, where the adjustment consists of subtracting a fraction of a failure for each patient, representing the expected or acceptable failure rate. The units on the vertical axis are then "excess failures." If the "process" is performing as expected, the resulting cumulative sum will be a jagged line hovering around the horizontal axis. For example, if the expected failure rate is 10%, then 0.1 (10% of a failure) is subtracted from the cumulative sum for each patient. When a patient fails, 1.0 (100% of a failure) is added, resulting in a net rise of 0.9 (1.0 to 0.1) at that point. Nothing is added to the expected risk if the patient does not fail; the drop stays at 0.1. Thus, if exactly 1 of the first 10 patients fails (as expected), then the cumulative sum at the tenth patient will be zero (1.0 to 10 x 0.1 = 0.0). If the process is experiencing more failures than expected the CUSUM curve will rise above the horizontal axis. This is what happened in Fig 1
of Novick and associates [1], but most of the rise occurred in the first 20 patients. After that the curve remained relatively flat, indicating a period of acceptable performance. If the process is performing better than expected, the CUSUM will drop below the horizontal axis, indicating fewer failures than expected.
|
Example: coronary artery bypass graft operative mortality
We illustrate the risk-adjusted Cusum using cardiac surgery mortality data from ten Providence Health System (PHS) hospitals in four western states that have contributed to a collaborative cardiac database. A logistic regression risk model for operative death after coronary bypass surgery was developed from 12 risk factors using 12,641 patients operated on from January 1997 through June 2002.
Logistic regression produces a score, called a logit (the logarithm of the odds of death), for each patient; the higher the logit, the higher the risk of death [20]. Figure 1 illustrates the results of the PHS risk model applied to four hospitals from 1998 to 2000. The darker symbols indicate patients who died. It is difficult to determine from these raw results how the hospitals are performing. Hospital C has the most deaths, but it also has the most patients. Hospital D had only one death in 1998, but not many patients. These raw results are transformed into a Cusum plot in two steps.
First, for more direct interpretation, the logits are transformed into probabilities [20]. The top halves of the panels in Fig 2 demonstrate the risks of death for each survivor on the probability scale, between 0 (0% risk of dying) and 1 (100% risk). For the each patient who dies, one is subtracted from the expected probability of death, so these points fall into the lower half of each panel (between -1 and 0). Thus Fig 2 is a plot of the expected (E) minus observed (O) outcome for each patient, where E is the probability of death from the risk model and O is 0 for survivors and 1 for deaths. If a patient had a 48% risk of death and did not die his point would lie at + 0.48 (arrow in hospital A); conversely, if a patient had a 44% risk of death and did die his point on this graph would lie at -0.56 (arrow in hospital C).
|
|
Comment
Cumulative sum techniques are an informative, visually helpful tool for presenting data and studying trends. CUSUM is a 50-year-old method that has been used on cardiac surgery mortality for the past 10 years. Treasure and associates [21]22 recently wrote an overview that covers much of the material in this study, and Grigg and colleagues [13] provide a thorough review on a more technical level.
Labeling the vertical axis
Plotting cumulative expected minus observed (E-O) mortality on the vertical axis (Fig 3) means that if the line goes up, the actual deaths are fewer than expected; the vertical axis could be labeled "lives saved." This was used in several reports [79] and we used it for the ease of explaining the transition from the individual (Fig 2) to the cumulative values (Fig 3). Plotting cumulative observed minus expected (O-E) on the vertical axis, as Novick and colleagues [1] and others [5, 6, 12, 13] have done, means that if the line goes up, the actual deaths are more than expected; the vertical axis could be labeled "excess deaths." Both styles of labeling are correct; which is used is a matter of taste, or emphasis.
Labeling the horizontal axis
Most reports used operation number on the horizontal axis, but some used date of surgery. The latter is advantageous when comparing trends within or between providers because the unit of reporting for quality assessment is usually calendar time (year or quarter of year). We used date of surgery (Fig 3), but similar curves would result from using number of cases if the caseload is constant across time. That this was the case can be inferred by the smoothness of the prediction curves in Fig 3. They increase as the numbers of patients increase; that they do so rather smoothly indicates that the number of patients is fairly constant over time.
Prediction limits
Prediction limits aid in deciding when a deviation from the horizontal axis is more than would be expected due to random variation. In Fig 3, when the Cusum curve goes outside of the prediction bands, it gives a suggestion of a statistically significant difference. There is a caveat with this, however, because it does not account for the implied multiple comparisons. A recent study, discussed in the Appendix, provides cumulative sum curves (Fig 4)
that do provide correct significance tests, and that "naturally complement intuitively attractive plots of cumulative observed-expected mortality" [12] (Fig 3).
|
Acknowledgments
The authors are grateful to Jeanne Zerr for guidance on content, to Ling Zhang for clerical support and to the following PHS hospitals for sharing their coronary bypass surgery data: Alaska: Providence Anchorage Medical Center; Washington: Providence Everett Medical Center, Providence Campus, Swedish Medical Center (Seattle), Providence St. Peter Hospital (Olympia), Providence Yakima Medical Center (Yakima); Oregon: Providence Portland Medical Center, Providence St. Vincent Medical Center (Portland); and California: Providence St. Joseph Medical Center (Burbank), Providence Holy Cross Medical Center (Mission Hills), Little Company of Mary Hospital (Torrance).
Appendix
Prediction and control limits
The risk-adjusted Cusum at time t is the cumulative sum of the expected (0 < E < 1) minus the observed (O = 0 or 1) mortality for all patients operated from the start of the analysis period up to time t (instead of E-O, many authors use O-E; see Comments section). If a process is operating at the expected risk, the Cusum will hover around the horizontal axis. Some departures from this line are to be expected due to random variation. It is important to have an indication of where the limits of random variation stop and divergent performance, good or bad, begins. For the prediction limits in Fig 3, we used the recommendation of Sherlaw-Johnson and coworkers [9], but computed it for each point along the horizontal axis, rather than at just the end of the curve, as they did. The standard error (SE) of the risk-adjusted Cusum at time t is the square root of the cumulative sum of E(1-E) for all patients operated from the start of the analysis period up to time t. Then 90%, 95%, or 99% two-sided prediction limits are obtained by multiplying the SE by 1.64, 1.96, or 2.58, respectively (usual quantiles of the standard normal distribution). This same formula has been used as pointwise confidence limits for the Cusum.
For an ongoing process, confidence or prediction intervals constructed at each point on the horizontal axis do not maintain their nominal size because of the implied multiple comparisons. For correct significance tests they should be augmented by a method based on formal hypothesis testing. This method uses a cumulative sum, not of intuitive units like "lives saved" or "excess deaths," but rather on units of "logarithm of the likelihood ratio" of the alternative to the null hypothesis [10, 11]. The larger this is, the more the evidence favors the alternative hypothesis over the null hypothesis; the resulting cumulative sums can be used to provide correct p values. Spiegelhalter and colleagues [12] recently used this approach to establish threshold values indicating rejection of the null or alternative hypothesis (Fig 4). Following others [1012], we chose as the null hypothesis that the odds ratio (OR) equals 1, that is that the observed mortality is as expected, and two different alternative hypotheses: OR = 2 to test for worse results than expected, and OR = 1/2 to test for better results than expected. The control limits are determined by a function of alpha and beta, the Type I and II error rates of the hypothesis test, respectively. We used alpha = 0.05, and for comparison with confidence intervals, beta = 0.50 [22]. The results in Fig 4 agree qualitatively with the simpler and more intuitive confidence interval method depicted in Fig 3, but appear more conservative, as expected. Significance derived from these two figures should not be expected to compare directly because the curves in Fig 4 are based on specific alternative hypotheses.
References
This article has been cited by other articles:
![]() |
L. Noyez Editorial comment: Quality measurement in adult cardiac surgery: a challenge Eur. J. Cardiothorac. Surg., May 1, 2009; 35(5): 758 - 759. [Full Text] [PDF] |
||||
![]() |
G. L. Grunkemeier, R. Jin, and Y. Wu Cumulative sum curves and their prediction limits. Ann. Thorac. Surg., February 1, 2009; 87(2): 361 - 364. [Full Text] [PDF] |
||||
![]() |
J. Brevig, J. McDonald, E. S. Zelinka, T. Gallagher, R. Jin, and G. L. Grunkemeier Blood transfusion reduction in cardiac surgery: multidisciplinary approach at a community hospital. Ann. Thorac. Surg., February 1, 2009; 87(2): 532 - 539. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. A. Bacha, L. A. Larrazabal, F. A. Pigula, K. Gauvreau, K. J. Jenkins, S. D. Colan, F. Fynn-Thompson, J. E. Mayer Jr., and P. J. del Nido Measurement of technical performance in surgery for congenital heart disease: The stage I Norwood procedure J. Thorac. Cardiovasc. Surg., October 1, 2008; 136(4): 993 - 997. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. A. Ferraris, F. H. Edwards, D. M. Shahian, and S. P. Ferraris Risk Stratification and Comorbidity Card. Surg. Adult, January 1, 2008; 3(2008): 199 - 246. [Full Text] |
||||
![]() |
M.-H. Song, Y. Tokuda, M. Hirai, and Y. Ueda Learning Curve of Arch-First Technique Analyzed by Cumulative Sum Asian Cardiovasc Thorac Ann, December 1, 2007; 15(6): 507 - 510. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. J Biau, M. Resche-Rigon, G. Godiris-Petit, R. S Nizard, and R. Porcher Quality control of surgical and interventional procedures: a review of the CUSUM Qual. Saf. Health Care, June 1, 2007; 16(3): 203 - 207. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Novick, S. A. Fox, L. W. Stitt, T. L. Forbes, and S. Steiner Direct comparison of risk-adjusted and non-risk-adjusted CUSUM analyses of coronary artery bypass surgery outcomes. J. Thorac. Cardiovasc. Surg., August 1, 2006; 132(2): 386 - 391. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. A. Rogers, J. S. Ganesh, N. R. Banner, R. S. Bonser, and On behalf of the steering Group Cumulative risk adjusted monitoring of 30-day mortality after cardiothoracic transplantation: UK experience Eur. J. Cardiothorac. Surg., June 1, 2005; 27(6): 1022 - 1029. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. H. Blackstone Monitoring surgical performance J. Thorac. Cardiovasc. Surg., December 1, 2004; 128(6): 807 - 810. [Full Text] [PDF] |
||||
![]() |
T. Treasure, S. Gallivan, and C. Sherlaw-Johnson Monitoring cardiac surgical performance: A commentary J. Thorac. Cardiovasc. Surg., December 1, 2004; 128(6): 823 - 825. [Full Text] [PDF] |
||||
![]() |
T. Treasure Congenital heart disease BMJ, March 13, 2004; 328(7440): 594 - 595. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |