|
|
||||||||
Ann Thorac Surg 2005;80:471-479
© 2005 The Society of Thoracic Surgeons
Providence Health System, Portland, Oregon
Accepted for publication February 23, 2005.
* Address reprint requests to Dr Jin, Providence St. Vincent Hospital and Medical Center, 9205 SW Barnes Rd, LL 33, Portland, OR97225 (Email: ruyun.jin{at}providence.org).
| Abstract |
|---|
|
|
|---|
METHODS: From 1997 to 2004, 3,324 patients aged 30 to 95 years underwent aortic valve replacement (AVR), and 1,596 underwent mitral valve replacement or repair (MVRR) at one of nine PHS medical centers. We used area under the receiver operating characteristic curve (c-index) to measure model discrimination, and Hosmer-Lemeshow statistic (H-L) to measure calibration. We modified the NNE models by ungrouping continuous variables, seeking optimal transformations of continuous variables, and imputing missing values by multiple regression.
RESULTS: The prevalence and the lethality of risk factors were similar in PHS and NNE patients. The NNE models fit PHS patients well: c-index (95% confidence interval) = 0.75 (0.70 to 0.80) for AVR and 0.81 (0.76 to 0.86) for MVRR; H-L = 3.95 (p = 0.861) for AVR and 7.10 (p = 0.526) for MVRR. A single PHS model performed slightly better for both positions: c-index = 0.79 (0.75 to 0.83) for AVR and 0.84 (0.80 to 0.88) for MVRR; H-L = 2.75 (p = 0.949) for AVR and 12.21 (p = 0.142) for MVRR.
CONCLUSIONS: The NNE models for aortic and mitral valve surgery were successfully validated using PHS patients. Using some different statistical approaches to modeling, we produced a new, unified model for both positions.
| Introduction |
|---|
|
|
|---|
The Providence Health System Cardiovascular Study Group (PHS) comprises nine medical centers in four Western states. PHS collected data on 4,920 single aortic or mitral valve operations during 1997 to 2004, providing an opportunity to evaluate the NNE risk models with external data. Specifically, we addressed the following questions: (1) Are NNE and PHS patients similar with regard to the distributions of risk factors? (2) Are the risk factor effects on mortality similar between the two patient groups? (3) Do the NNE models perform well at predicting mortality for the PHS patients? (4) Can we refine the NNE models by applying some different statistical approaches?
| Material and Methods |
|---|
|
|
|---|
The PHS data definitions matched all but two of the NNE risk factor definitions, and we used somewhat expanded versions of those two variables. We used history of cerebrovascular disease (includes permanent, reversible, and transient events) as a substitute for prior stroke and atrial arrhythmia (includes flutter and others) as a substitute for atrial fibrillation.
Statistical Methods
Evaluating risk models
We used traditional methods to evaluate the performance of the risk models, which were developed by logistic regression [2]. We assessed discrimination by the c-index (area under the receiver operating characteristic curve) [3], and calibrated by the Hosmer-Lemeshow statistic [4] and cumulative sum analysis (Cusum) [5].
Discrimination is more critical than calibration, because one can always recalibrate a model without changing its discrimination [6]. We used a simple recalibration to make overall expected mortality equal observed mortality [7, 8]. This is done by performing a new logistic regression with a single risk factorthe risk score from the NNE modeland using the resulting mortality predictions.
Developing risk models
Many decisions are made in producing a risk model [9]. We revisited some of the decisions that NNE made, and applied some additional statistical methods.
|
| Results |
|---|
|
|
|---|
First, are NNE and PHS patients similar with regard to the distributions of risk factors? The NNE had 5,793 AVR and 3,150 MVRR operations from 1991 to 2001. The PHS had 3,324 AVR and 1,596 MVRR operations from 1997 to 2004. The distribution of the patient risk factors were similar for AVR (Fig 1) and MVRR (Fig 2), except that NNE had more patients in New York Heart Association classes 3 and 4, and more urgent or emergent operations, whereas PHS had more patients with elevated preoperative serum creatinine.
|
|
|
|
|
| Continuous variables |
|---|
|
|
|---|
The mortality rate by age group is shown in Figure 5, with the univariate logistic curve providing a smooth, continuous fit. Figure 6 shows a multivariate depiction of this age/mortality relationship, using Cusum plots. The cumulative sums of observed minus expected deaths from the final PHS model, with and without age included, are superimposed. When age is removed from the model (gray line), the plot drops rather uniformly until about age 72, where there is an excess of about 42 expected deaths (compared with observed), after which it rises. That means that for younger patients there are fewer deaths than expected (from the model that does not have age in it), and for the older patients there are more deaths than expected from the (age-less) model. The Cusum test for a linear trend with age is significant (p < 0.001) [5]. When age is put back into the model (black line), the Cusum plot does not show a linear trend with age (p = 0.972), only random variation about the line of equality (observed = expected), with a maximum excursion of about 7 deaths.
|
|
|
| Other risk factors |
|---|
|
|
|---|
| Combining AVR and MVRR models |
|---|
|
|
|---|
| Final PHS model: Comparison with calibrated NNE model |
|---|
|
|
|---|
|
| Comment |
|---|
|
|
|---|
Risk-adjusted mortality models provide an individualized probability of hospital death for each patient, to give the patient and the surgeon a reasonable expectation of the true risk. These expected risks can be aggregated across providers, or across time, and compared with the observed mortality to yield performance measures. The NNE documented 74 fewer CABG deaths than expected in the first 6 years of their quality improvement program [24] and 811 fewer deaths than expected in the first 15 years [25].
The NNE has been a model and inspiration for the Providence Health System Cardiovascular Study Group, which has been collecting similar data since 1997. One of the current PHS initiatives was to acquire a risk model for hospital mortality after heart valve surgery. The NNE risk models [1] presented an ideal opportunity.
A recent paper on validating prognostic models argues for "the need to evaluate performance of a model on a new series of patients, ideally in a different location," to provide an "empirical demonstration of transportability" [26]. The study by Ivanov and colleagues for CABG mortality risk model also conclude that "Any existing index used for risk assessment in cardiac surgery should be episodically recalibrated or compared with new models derived from local subjects to ensure that its performance remains optimal" [27]. The PHS data, collected 3,000 miles away, during an overlapping time frame, provide an excellent opportunity to perform such validation on the NNE risk models.
Thus, our intention was to validate the NNE models as a prelude to possibly adopting them for PHS use.
Derivation of the PHS Model
Although the NNE models worked well on the PHS data, we proceeded to modify them by applying some different statistical strategies (see Statistical Methods section), and the considering additional risk factors used in other heart valve risk models. Table 1 lists the risk factors used in valve surgery models with more than 1,000 patients and five or more risk factors [1, 1321]. These minimum values were not arbitrary; a mortality rate of 5% to 10% implies 50 to 100 deaths, and 10 deaths are required to support each risk factor considered in a risk model [6]. A notable omission from this table is the EuroSCORE [28], but that model is not specifically for valve surgery.
The PHS model uses age and (the logarithm of) creatinine as continuous variables. But we could not find a transformation of BMI that performed as well as a dichotomy, with the middle range having lower risk. This agrees with our previous finding (Jin R, Grunkemeier GL, Furnary AP, Handy JR Jr. Is obesity a risk factor for mortality in coronary artery bypass surgery? Circulation. Accepted 2005) that BMI from 23 to 30 seems to be the most protective range for death in CABG surgery patients.
Imputing With Missing Values
There are many ways to deal with missing values 12]. Some studies make a heroic effort to complete the missing information, in order to include all cases in the model [18]. This is the best way, but impractical for many facilities. A common method is to discard patient records with incomplete information on any of the risk factors necessary for the model; but this wastes the information for those patients on variables that are not missing and may bias the results if the excluded patients are not representative of the others. Other methods include eliminating variables with a large amount of missing data [29]; imputing the missing values with the median for continuous variables and the most prevalent value for categorical variables [13, 15]; creating an indicator variable to code missing values [1, 17]; and assuming that missing information is equivalent to absence of the risk factor [30]. In this study, we imputed preoperative creatinine using values provided by a linear regression on the other risk factors (Fig 7). A drawback to this method is that the resulting confidence intervals are too conservative (narrow), because the imputed values are used as if real, and the imprecision in their estimation is not reflected in the model. (A more complex procedure, multiple imputation, would overcome this limitation [12].)
Combined Model for AVR and MVRR
This is a major step, to consider a single valve surgery model, not separated by position. A first reaction might be that the patients and their outcomes are too dissimilar to combine. But, in fact, published risk models for aortic and mitral valve mortality have similar risk factors (Table 1), and the most recent version of The Society of Thoracic Surgeons risk models for heart valve replacement combined positions [13].
Validation of the PHS Model
The PHS model fit our data slightly better than the calibrated NNE models (Table 2), which is not surprising as the PHS model was derived from the data upon which it was evaluated, thus inflating the goodness of its fit 6]. We evaluated the models both calibrated and uncalibrated (Table 2). If we were simply adopting it as standard, it would best be used uncalibrated, with any overall differences between observed and expected being a measure of some true differencessuch as unmeasured patient variables or variation in quality of care. But our goal was to compare the models in the fairest possible way; so for this comparison, we used the calibrated NNE model. The best test of the PHS model would be performed "...on a new series of patients, ideally in a different location" [26]. (Perhaps someone will do so in the future, as we have done for the NNE models in this paper.)
Limitations
As with any risk model, there are no doubt some important risk factors that are not included in the analysis. In general, the more data (deaths), the more risk factors that can enter a model. However, there is a practical advantage to parsimony in a model, especially with regard to completeness of the data in practical use. The important paper by Jones and coworkers [31] showed that seven "core" risk factors accounted for 45% of the variability and another 13 "level 1" variables accounted for another 38% of the variability. One factor that is usually not included in a risk model is the calendar time of operation. Conditions surrounding the performance of CABG are constantly changing over time, mostly for the better, and the time frame of surgery (in our case 1997 to 2004) can be a surrogate for many small factors not in the risk model. This should be born in mind when applying any risk model to future patients.
| Conclusion |
|---|
|
|
|---|
| Appendix |
|---|
c-index [3]. The c-index is identical to the area under the receiver operating characteristic (ROC) curve for binary outcomes. The area measures discrimination, that is, the ability of the test to correctly classify those with and without the event. Its value represents the probability, from zero to one, that a randomly selected death will have a higher risk of dying than a randomly selected non-death.
Cumulative sum (Cusum) analysis [5]. The Cusum plot shows the cumulative sum of the differences of observed minus expected deaths. If the model for expected deaths has a good fit, the value of the Cusum line should vary randomly around zero. It can be used to assess the effect of continuous variables and test the calibration of a model.
Fractional polynomials [11]. This method is used to look for transformations of a continuous independent variable (risk factor). In an organized way, many combinations of two transformations (logarithm and several different powers) are tested and then an optimal combination is chosen.
Hosmer-Lemeshow (H-L) statistic [4]. The Hosmer-Lemeshow statistic is used to test the calibration of a model. It groups the ordered predictions from the logistic regression model into deciles and produces a Pearson-like statistic which has a chi-square distribution with 8 degrees of freedom.
Imputation of missing values [12]. There are several ways of imputing missing values for a given variable, including creating a multiple regression model from the other variables to estimate the missing variable (which we used) and multiple imputation, which repeats this process several times to insert the appropriate variability into the estimates (which is considered the optimal method).
Interaction term [2]. If one risk factor has a synergistic or potentiating effect on another risk factor, the two factors involved are said to interact, and a term representing this interaction (the product of the two first-level factors) may be needed in the model.
Likelihood ratio [2]. The likelihood ratio is used to test the goodness-of-fit between two models. Adding additional risk factors will always result in a higher likelihood. A chi-square test can determine if the difference in likelihood between two models is statistically significant, namely, if the additional parameters improve the model or not.
Logistic regression [2]. Logistic regression is a standard method of data analysis concerned with describing the relationship between a binomial response variable and one or more explanatory variables. The model can predict the probability of occurrence of a binary outcome, in our case, the probability of hospital death.
Odds ratio [2]. The odds (O) of an event (eg, death) is related to the probability (P) of the event by O = P/(1 P). When P is small, they are approximately equal. Logistic regression produces an odds ratio (OR) as a measure the strength of a risk factor. The OR is the odds of the event with the risk factor present, divided by the odds of the event with the factor absent.
Recalibration [8]. Recalibration of a predictive model is done by creating a new model based on the prediction of the original model, to ensure that the number of predicted (by the new model) and observed deaths are equal. This improves the calibration of the model but does not change its discrimination.
| Acknowledgments |
|---|
|
|
|---|
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. van Gameren, N. Piazza, A. J J C Bogers, J. J M Takkenberg, and A P. Kappetein How to assess risks of valve surgery: quality, implementation and future of risk models Heart, December 1, 2009; 95(23): 1958 - 1963. [Full Text] [PDF] |
||||
![]() |
V. C. Carosella, J. L. Navia, S. Al-Ruzzeh, H. Grancelli, W. Rodriguez, C. Cardenas, J. Bilbao, and C. Nojek The first Latin-American risk stratification system for cardiac surgery: can be used as a graphic pocket-card score Interactive CardioVascular and Thoracic Surgery, August 1, 2009; 9(2): 203 - 208. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. O'Brien, D. M. Shahian, G. Filardo, V. A. Ferraris, C. K. Haan, J. B. Rich, S.-L. T. Normand, E. R. DeLong, C. M. Shewan, R. S. Dokholyan, et al. The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 2--isolated valve surgery. Ann. Thorac. Surg., July 1, 2009; 88(1 Suppl): S23 - S42. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. M. Shahian, S. M. O'Brien, G. Filardo, V. A. Ferraris, C. K. Haan, J. B. Rich, S.-L. T. Normand, E. R. DeLong, C. M. Shewan, R. S. Dokholyan, et al. The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 3--valve plus coronary artery bypass grafting surgery. Ann. Thorac. Surg., July 1, 2009; 88(1 Suppl): S43 - S62. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. L. Grunkemeier, R. Jin, and Y. Wu Cumulative sum curves and their prediction limits. Ann. Thorac. Surg., February 1, 2009; 87(2): 361 - 364. [Full Text] [PDF] |
||||
![]() |
J. M. Brown, S. M. O'Brien, C. Wu, J. A. H. Sikora, B. P. Griffith, and J. S. Gammie Isolated aortic valve replacement in North America comprising 108,687 patients in 10 years: changes in risks, valve types, and outcomes in the Society of Thoracic Surgeons National Database. J. Thorac. Cardiovasc. Surg., January 1, 2009; 137(1): 82 - 90. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Roedler, M. Czerny, J. Neuhauser, D. Zimpfer, R. Gottardi, D. Dunkler, E. Wolner, and M. Grimm Mechanical Aortic Valve Prostheses in the Small Aortic Root: Top Hat Versus Standard CarboMedics Aortic Valve Ann. Thorac. Surg., July 1, 2008; 86(1): 64 - 70. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Iung Management of the elderly patient with aortic stenosis Heart, April 1, 2008; 94(4): 519 - 524. [Full Text] [PDF] |
||||
![]() |
M. van Gameren, A. P. Kappetein, E. W. Steyerberg, A. C. Venema, E. A.J. Berenschot, E. L. Hannan, A. J.J.C. Bogers, and J. J.M. Takkenberg Do We Need Separate Risk Stratification Models for Hospital Mortality After Heart Valve Surgery? Ann. Thorac. Surg., March 1, 2008; 85(3): 921 - 930. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Kuduvalli, A. D. Grayson, J. Au, G. Grotte, B. Bridgewater, B. M. Fabri, and on behalf of the North West Quality Improvement Pr A multi-centre additive and logistic risk model for in-hospital mortality following aortic valve replacement Eur. J. Cardiothorac. Surg., April 1, 2007; 31(4): 607 - 613. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. L. Hannan, C. Wu, E. V. Bennett, R. E. Carlson, A. T. Culliford, J. P. Gold, R. S.D. Higgins, C. R. Smith, and R. H. Jones Risk Index for Predicting In-Hospital Mortality for Cardiac Valve Surgery Ann. Thorac. Surg., March 1, 2007; 83(3): 921 - 929. [Abstract] [Full Text] [PDF] |
||||
![]() |
Authors/Task Force Members, A. Vahanian, H. Baumgartner, J. Bax, E. Butchart, R. Dion, G. Filippatos, F. Flachskampf, R. Hall, B. Iung, et al. Guidelines on the management of valvular heart disease: The Task Force on the Management of Valvular Heart Disease of the European Society of Cardiology Eur. Heart J., January 26, 2007; (2007) ehl428v1. [Full Text] [PDF] |
||||
![]() |
M. Turina Supra-annular aortic valve replacement with a mechanical prosthesis MMCTS, November 29, 2005; 2005(1129): 646. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. R. Nowicki What is the Future of Mortality Prediction Models in Heart Valve Surgery? Ann. Thorac. Surg., August 1, 2005; 80(2): 396 - 398. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |