Ann Thorac Surg 2010;89:677-682. doi:10.1016/j.athoracsur.2009.10.078
© 2010 The Society of Thoracic Surgeons
The Statistician's Page
Using Society of Thoracic Surgeons Risk Models for Risk-Adjusting Cardiac Surgery Results
Ruyun Jin, MDa,*,
Anthony P. Furnary, MDa,
Stephanie C. Fine, MAa,
Eugene H. Blackstone, MDb,
Gary L. Grunkemeier, PhDa
a Medical Data Research Center, Providence Health & Services, Portland, Oregon
b Department of Thoracic and Cardiovascular Surgery and Department of Quantitative Health Sciences, Cleveland Clinic, Cleveland, Ohio
* Address correspondence to Dr Jin, 9205 SW Barnes Rd, Ste 33, Portland, OR 97225 (Email: ruyun.jin{at}providence.org).
 |
Abstract
|
|---|
The Society of Thoracic Surgeons National Adult Cardiac Surgery Database (STS NCD) has become the national benchmark for cardiac surgery reporting. Several important aspects of its risk-adjustment reporting are discussed, with special emphasis on using the reported individual STS risk scores for analysis and evaluation: (1) Different risk models are used in different STS NCD versions. (2) STS calibrates risk scores annually to make the annual predicted rates equal the observed rates. (3) The risk scores given by the STS, whether in the approved STS data collection software programs, published risk models, or online calculator, are not calibrated. (4) The end-user is required to calibrate the STS risk scores before using them. (5) After calibration, the STS predicted risk for any given patient is usually smaller, sometimes less than half of the uncalibrated value. (6) STS uses an observed/expected ratio method to calibrate the risk scores; for technical reasons, it is preferable to use an odds ratio method.
 |
Introduction
|
|---|
The Society of Thoracic Surgeons (STS) initially developed the STS National Adult Cardiac Surgery Database (STS NCD) in the late 1980s [1]. After many years of development and improvement, it has become one of the largest clinical databases in the world. At the end of 2008, 892 sites participated and submitted data. The annual number of procedures has reached 270,000 [2].
STS has developed risk models for death and major morbidities. The models are updated approximately every 3 years and are calibrated annually to make the expected (predicted) occurrences of each outcome equal to the national observed events for that year. Figure 1
shows the process that STS uses to build the models and calibrate the risk scores for their use in the STS reports issued to participant institutions. However, the individual risk scores provided to these institutions are not calibrated, and cannot be used as given.
 |
Clinical Material
|
|---|
The Providence Health & Services Cardiovascular Disease Study Group (PHS-CDSG) consists of 12 open heart surgery programs that participate in the STS NCD. From 2006 to 2008, we collectively submitted data for 11,578 open heart procedures to STS. Among them, 5956 were isolated coronary artery bypass grafting (CABG), 1448 were isolated valve procedures, and 1074 were isolated valve plus CABG, the three procedure categories that have STS risk models. Mitral repair procedures for 2006 to 2007 are not included because their associated STS risk model was not available until 2008. These data will be used to demonstrate the use of the STS calibration factors and the serious error of using the raw, uncalibrated risk scores provided by STS.
 |
STS Risk Models
|
|---|
STS began building risk models in 1994 [3], starting with isolated CABG mortality. A series of additional models have been developed since then to keep the predictive models timely, concurrent, and thus more reliable. Risk models now also include risk of death for isolated valve or valve concomitant with CABG operations as well as several morbidity risk models. Some of the models have been published [4–12] and others are used only for STS annual reports. STS NCD has completely redeveloped its risk models approximately every 3 years. The latest models, the "2008 Models," were developed using data collected during 2002 to 2006 [9–12] and comprise 27 models, including nine end points for each of three cardiac procedure groups:
Procedures
- 1 Isolated CABG
- 2 Isolated valve procedure (aortic valve replacement; mitral valve replacement or repair)
- 3 Isolated valve procedure plus CABG
End points
- 1 Operative mortality
- 2 Reoperation for any reason
- 3 Permanent stroke
- 4 Renal failure
- 5 Deep sternal wound infection
- 6 Prolonged ventilation time (>24 hours)
- 7 Operative death or major morbidity (of the five types above)
- 8 Short postoperative length of stay (<6 days and discharged alive)
- 9 Long postoperative length of stay (>14 days)
The models just before the 2008 Models are called the "2004 Models." The mortality model for CABG was developed in 2004 using 2001 to 2002 procedures; the mortality models for isolated valve or valve concomitant with CABG were developed in 1999 using 1994 to 1997 procedures [11].
STS Risk Scores
STS has developed algorithms to calculate the predicted risk scores, which are embedded in its approved vendors' software. Currently, there are 16 software vendors with STS Certified Software or an STS Harvest Compliant Software product in STS NCD [13]. When the local user creates an STS NCD patient record or data submission file, the risk scores (uncalibrated) are automatically calculated and inserted into each patient record.
There are three ways to obtain an uncalibrated STS risk score: (1) from the STS-approved software, as just described; (2) by using the online STS calculator for 1 patient at a time [14]; and (3) by applying the published regression coefficients for 1 or more patients at a time [9–12].
Different Risks in Different STS NCD Versions
From 1999 until now, the STS has released four versions of the STS NCD, and the built-in models were changed accordingly. The most current STS 2008 models are embedded in the version 2.61 data collection tools (except for the morbidity models for isolated valve and valve concomitant with CABG, which are not yet included) and are applied to patients with operation dates after January 1, 2008. The 2004 Models were embedded in the version 2.52 data collection tools and applied to patients with operation dates from January 1, 2005 through December 31, 2007.
Thus, patients with exactly the same preoperative risk factors could receive different predicted risk values if they underwent operation in different time periods. For example, if a particular patient underwent isolated CABG in December 2007, the predicted risk of death from version 2.52 would be calculated using the 2004 Models as 0.4%. If the same patient were operated on in January 2008, the predicted risk of mortality from version 2.61 would be calculated using the 2008 Models as 0.8% because of the differences in risk factors and coefficients between the two models.
 |
Calibration of STS Risk Scores is Required
|
|---|
The important message of this article, and what may come as a surprise to many users—even though it is mentioned in every STS report—is that the individual patient risk scores that we have been discussing require calibration, using the STS-provided calibration factors (CF), before they can be used to, say, compute observed (O)/expected (E) ratios or compare provider performance over time. (Note: The STS uses the term "recalibration," but we prefer the less redundant and simpler term "calibration.")
The observed death and complication rates have been declining for most open heart surgical complications, so a risk model produced from data collected in previous years will assign a risk to a current patient that is higher than what that patient would expect in the current year. Thus, three steps are required when the STS wants to determine what the correct risk should be in the current year:
- 1 estimate the predicted risk from the old, prevailing risk models for each patient operated on so far, and sum them to provide an estimate of total number of expected events (= E);
- 2 count how many patients actually experienced that event so far in the current year (= O); and
- 3 compute a CF as the ratio E/O, which is the ratio the STS says should be used to divide into the predicted probability (risk) given by the STS-approved software to obtain the calibrated (true) risk score.
If, for example, only 80% of the deaths expected by the risk model were observed for a particular operation in a particular year, then the CF = 1/.80 = 1.25, so the STS risks should be divided by 1.25. Thus, for example, a patient with an STS risk of 2.5% would have a calibrated risk of only 2.0%.
For almost all predicted risk values, the raw STS scores are too high (sometimes by more than double) and need to be decreased. This requires the end user to divide them by the CF that is given in the STS report, usually in a Table called "O/E Ratio Multiplier Table for Calibration." (See Table 1
in the Sample National Report on the STS web site [15] as an example.) The vendor-supplied STS software does not incorporate the CF correction, nor do the published models or the online calculator. The CFs are given by the year of operation, type of operation, and end point. Of note, the CF is more procedure-specific than the model itself. For example, the STS has only one mortality model for an isolated valve operation but gives separate CFs for aortic valve replacement, mitral valve replacement, and mitral valve repair.
View this table:
[in this window]
[in a new window]
|
Table 1 Society of Thoracic Surgeons (STS) Calibration Factors of In-hospital Mortality for Isolated Coronary Artery Bypass Grafting, Summarized From 10 STS Reports
|
|
Calibration of any STS risk score is easy. First, find the appropriate CF by operation year, procedure, and end point. Then, divide the predicted risk score by CF to get the calibrated predicted risk. For the previously mentioned isolated CABG procedure performed in January 2008, the predicted mortality is 0.8%, the CF for in-hospital mortality for 2008 isolated CABG is 1.26, and the calibrated predicted risk of in-hospital mortality is thus 0.8%/1.26, or about 0.6%. Figure 2
shows the difference between calibrated and uncalibrated results for the PHS-CDSG data, using cumulative sum plots [16]. The differences are striking, especially when the sample size is large.

View larger version (46K):
[in this window]
[in a new window]
|
Fig 2. Cumulative Sum (CUSUM) graphs of Providence Health & Services Cardiovascular Disease Study Group in-hospital mortality, with and without calibration. The thick black CUSUM lines are based on calibrated risk scores and the thick grey CUSUM lines are based on uncalibrated risk scores. The smooth bullet-shaped curves are the 95% predication limits, with the same color code (black and grey, respectively). These prediction limits are slightly different because their computation is based on the risk scores. The vertical dashed lines indicate 250 cases.
|
|
Calibration Tables Are Dynamic, not Static
Another important aspect of the CFs is that they are updated after every new data harvest, which occurs every 3 months in recent years. The CFs from a series of reports are listed in Table 1 to show the continual changes over time. The reason for these frequent changes is that the STS uses all data available up to the point of harvest to recalculate the CFs for a given year. The first released CF for a certain year is calculated using only the data from the first quarter of that year, the second released CF is calculated using the data from the first 2 quarters, and so on. Because STS allows resubmissions for data corrections, the CFs can even change slightly in the periods after the data submission (Table 1). Thus, to calibrate the predicted STS risk, one must always find the latest report and use the most recent CFs. Table 2
offers the most recent published CFs as of July 18, 2009, which summarizes the CFs from several reports, including the latest report (STS 2009 Harvest 2 report).
View this table:
[in this window]
[in a new window]
|
Table 2 The Most Current Published Calibration Factors, Summarized From Five Society of Thoracic Surgeons (STS) Reports After STS 2009 Harvest 2 Report
|
|
 |
A Technical Concern with the STS Calibration Method
|
|---|
The STS calibration method is based on the O/E ratio. The "E" is calibrated to force the national O/E ratio to equal 1, but there is a fundamental technical problem with this method of calibration. If in-hospital death is the event of interest, we should reach the same conclusion if its complement event, in-hospital survival, is selected instead for calibration; however, this would not usually happen. For example, if the CF for death is 2.0 (say, if E = 10% and O = 5%), then the CF for survival should be 0.5 (if you are twice as likely to die, then you are half as likely to survive). In this case, however, the CF for survival would equal .90/.95, because for survival, E = 90% and O = 95%. This difficulty can also be appreciated by considering that for a patient with a CF that is less than his or her STS risk, the calibrated risk will be greater than 1 (illustrated in Fig 3), an impossibility for a true probability. This happened infrequently in our data, but it is symptomatic of the general problem with this calibration approach.

View larger version (60K):
[in this window]
[in a new window]
|
Fig 3. Comparison of the uncalibrated risks of short length of stay (SLOS) and in-hospital mortality with the corresponding risks as calibrated by two different methods. The two lines below the diagonal line are predicted risk of in-hospital mortality; the two lines above the diagonal are predicted risk of short length of stay. The black lines show the calibrated risk by the observed/expected (O/E) method; the grey lines show the calibrated risk by the odds ratio (OR) method. Both methods force the observed death equal to the predicted death. Each short vertical bar represents a single patient. Most of the predicted risks for mortality are less than 15%. The predicted risks of SLOS are more widespread. Note that the calibrated risks of SLOS by the O/E method are greater than 100% in 4 patients, which is impossible for a true probability. This impossibility cannot occur with the OR method.
|
|
Steyerberg and colleagues [17] describe several methods of calibration based on an odds ratio (OR). The simplest one (method 2 in their article) consists of estimating the intercept in a logistic regression that uses the logarithm of the predicted odds (called the "logit") from the original model as an offset term, which means that its coefficient is not estimated in the regression but is set equal to 1, so that the only parameter to estimate is the intercept. Exponentiating this intercept will give a CF that is an OR, CFOR, the ratio of the observed odds to the odds as predicted by the original model (see Appendix). This CFOR can be converted to the probability domain and will force the sum of the expected events to equal the sum of the observed events, as predicted by the calibrated probabilities. Using this method, the calibrated risk probabilities will always be less than 1.
Also, the CFOR for survival will always equal the inverse of the CFOR for death. In the example just given, if E = 10% and O = 5%, then the expected odds for death = 10%/(1 – 10%) = 1/9 and the expected odds for survival = 9/1. The observed odds for death = 1/19, and the observed odds for survival = 19/1. Thus, the CFOR for death = (1/19)/(1/9) = 9/19, and the CFOR for survival = 19/9, the inverse of the CFOR for death. (This is not how the logistic regression method computes the CFOR, but is just a simplified example to demonstrate the statistical concept.)
Because we do not have access to the raw STS data to compare the results of these two calibration methods, we used PHS-CDSG isolated CABG data for in-hospital mortality and short length of stay (SLOS) to demonstrate the differences. In-hospital mortality has small risk percentages (mean, 2.4%) and SLOS involves larger risks (mean, 51.9%).
In-hospital Mortality
Among the 5956 patients undergoing isolated CABG, there were 121 observed in-hospital deaths (2.0%), and the sum of the predicted STS mortality probabilities is 144.7, for an overall predicted risk of 144.7/5956 = 2.4%. Using the STS O/E method, the CFO/E = 144.7/121 = 1.20. Note that this is the value by which we must divide the original predicted risk to calibrate it (ie, to make it equal the observed rate: 2.4%/1.20 = 2.0%), so the calibration achieved its goal. If we calibrate this value by the OR method, the CFOR produced by the logistic regression is CFOR = .82. Note that this is the value by which we must multiply the original odds to get the calibrated odds. (The CFOR for survival equaled 1.22, which is the reciprocal of .82.)
Short Length of Stay
The rate of SLOS is 3536/5956 = 59.4%. The sum of the predicted STS risk is 3090.9. Thus, the STS CFO/E = 3090.9/3536 = .87, and, using the OR method, CFOR = 1.42. (The CFOR for not SLOS was .70, which is the reciprocal of 1.42.)
Figure 3 shows the differences in the calibrated risks by these two calibration methods. For in-hospital mortality, the calibrated risks are close when the predicted risk is less than 15%, which includes 98% of patients. As the predicted risk gets larger, the discrepancies increase. For SLOS, the differences between the results of these two methods are up to 10% of the risk at some points. Thus, in practice, the STS O/E method of calibration is a satisfactory approximation to the technically more appropriate OR method in low-rate events. For higher-rate events (eg, any major complications, prolonged ventilation, or SLOS), the differences between the two methods are greater.
 |
Take-Home Messages
|
|---|
- Different risk models are used in different STS NCD versions.
- Calibration is required after obtaining raw STS risk scores.
- STS calibrates its risk scores annually (in quarterly increments) to make the annual predicted rates of any event equal that year's observed rates of the same event.
- Calibration factors are dynamic, updated quarterly after each data harvest.
- If used without calibration, the risk scores are almost always higher than they should be, thereby overstating risk and understating the O/E ratio.
- The current STS calibration method is not technically optimal; the OR method is preferable from a technical viewpoint.
 |
Participating Facilities
|
|---|
The following facilities participate in the Providence Health & Services Cardiovascular Disease Study Group:- Alaska: Providence Alaska Medical Center, Anchorage.
- Washington: Providence Regional Medical Center, Everett; Providence St. Peter Hospital, Olympia; Providence Sacred Heart Medical Center & Children's Hospital, Spokane.
- Oregon: Providence Portland Medical Center; Providence St. Vincent Medical Center, Portland; Providence Medford Medical Center.
- California: Providence St. Joseph Medical Center, Burbank; Providence Holy Cross Medical Center, Mission Hills; Providence Little Company of Mary Hospital, Torrance; Providence Tarzana Medical Center.
- Montana: St. Patrick Hospital and Health Sciences Center, Missoula.
 |
Appendix
|
|---|
Details of the Odds Ratio (OR) Method of Calibration
The original STS logistic regression model produces a predicted probability (PROB) for each patient. These can be converted to odds (ODDS):
And, equivalently, this conversion can be reversed:
In a logistic regression, the logarithm of the odds, written log(odds), and also called the logit (LOGIT), is modeled by a linear predictor.
The estimated logits are then converted back into a probability, for each patient's risk, using the second and fourth equations, above.
In the calibration method using the OR approach, a logistic regression is performed using the original logit as the only independent variable. However, in this simple calibration step, we do not estimate a coefficient for this logit. Instead, we force its coefficient to be 1 (the original logit becomes an offset term).
The new linear predictor is thus simply:
where LOGIT1 is the original, uncalibrated value and LOGIT2 is the calibrated value.
This new logistic regression will give an estimate of the intercept a, and exponentiating both sides of this equation gives:
or, if A = exp(a), then A = ODDS2/ODDS1; that is, A is an OR, namely, the ratio of the calibrated odds (ODDS2) to the original odds (ODDS1).
We can use this (A) and the original (uncalibrated) probability (PROB1) for any patient to obtain the calibrated probability (PROB2), using the above formulas, as follows:
 |
References
|
|---|
- Clark RE. It is time for a national cardiothoracic surgical data base Ann Thorac Surg 1989;48:755-756.[Free Full Text]
- Data analyses of the Society of Thoracic Surgeons National Adult Cardiac Surgery Database. Duke Clinical Research Institute (on behalf of The Society of Thoracic Surgeons); 2009 Harvest 1 (for the time period ending 12/31/2008):1–238.
- Edwards FH, Clark RE, Schwartz M. Coronary artery bypass grafting: the Society of Thoracic Surgeons National Database experience Ann Thorac Surg 1994;57:12-19.[Abstract/Free Full Text]
- Edwards FH, Grover FL, Shroyer AL, Schwartz M, Bero J, Clark RE. The Society of Thoracic Surgeons National Cardiac Surgery Database: current risk assessment Ann Thorac Surg 1997;63:903-908.[Abstract/Free Full Text]
- Shroyer AL, Grover FL, Edwards FH. 1995 coronary artery bypass risk model: the Society of Thoracic Surgeons Adult Cardiac National Database Ann Thorac Surg 1998;65:879-884.[Abstract/Free Full Text]
- Shroyer AL, Plomondon ME, Grover FL, Edwards FH, Schwartz M, Bero J. The 1996 coronary artery bypass risk model: the Society of Thoracic Surgeons Adult Cardiac National Database Ann Thorac Surg 1999;67:1205-1208.[Abstract/Free Full Text]
- Grover FL, Shroyer AL, Hammermeister K, et al. A decade's experience with quality improvement in cardiac surgery using the Veterans Affairs and Society of Thoracic Surgeons national databases Ann Surg 2001;234:464-474.[Medline]
- Ferguson Jr TB, Hammill BG, Peterson ED, DeLong ER, Grover FL, Committee STS N.D. A decade of change—risk profiles and outcomes for isolated coronary artery bypass grafting procedures, 1990–1999: a report from the STS National Database Committee and the Duke Clinical Research Institute. Society of Thoracic Surgeons Ann Thorac Surg 2002;73:480-490.[Abstract/Free Full Text]
- Shahian DM, Edwards FH. The Society of Thoracic Surgeons 2008 cardiac surgery risk models: introduction Ann Thorac Surg 2009;88:S1.[Free Full Text]
- Shahian DM, O'Brien SM, Filardo G, et al. The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 1—coronary artery bypass grafting surgery Ann Thorac Surg 2009;88:S2-S22.[Abstract/Free Full Text]
- O'Brien SM, Shahian DM, Filardo G, et al. The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 2—isolated valve surgery Ann Thorac Surg 2009;88:S23-S42.[Abstract/Free Full Text]
- Shahian DM, O'Brien SM, Filardo G, et al. The Society of Thoracic Surgeons 2008 cardiac surgery risk models: part 3—valve plus coronary artery bypass grafting surgery Ann Thorac Surg 2009;88:S43-S62.[Abstract/Free Full Text]
- STS National Database Software Vendorshttp://www.sts.org/sections/stsnationaldatabase/vendors/ 2009Accessed Oct 14, 2009.
- STS National Database Risk Calculatorhttp://www.sts.org/sections/stsnationaldatabase/riskcalculator/ 2009Accessed Oct 14, 2009.
- STS National Database Software Vendorshttp://www.sts.org/documents/pdf/Report_Overview_-_Risk_Adjustment.pdf 2009Accessed Oct 14, 2009.
- Grunkemeier GL, Jin R, Wu Y. Cumulative sum curves and their prediction limits Ann Thorac Surg 2009;87:361-364.[Free Full Text]
- Steyerberg EW, Borsboom GJ, van Houwelingen HC, Eijkemans MJ, Habbema JD. Validation and updating of predictive logistic regression models: a study on sample size and shrinkage Stat Med 2004;23:2567-2586.[Medline]