ATS
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


Ann Thorac Surg 2009;88:S2-S22. doi:10.1016/j.athoracsur.2009.05.053
© 2009 The Society of Thoracic Surgeons

This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Personal Folders
Right arrow Download to citation manager
Right arrow Author home page(s):
David M. Shahian
Sean M. O'Brien
Giovanni Filardo
Victor A. Ferraris
Constance K. Haan
Jeffrey B. Rich
Sharon-Lise T. Normand
Cynthia M. Shewan
Rachel S. Dokholyan
Eric D. Peterson
Fred H. Edwards
Richard P. Anderson
Right arrow Permission Requests
Google Scholar
Right arrow Articles by Shahian, D. M.
Right arrow Articles by Anderson, R. P.
PubMed
Right arrow Articles by Shahian, D. M.
Right arrow Articles by Anderson, R. P.
Related Collections
Right arrow Cardiac - other
Right arrow Education
Right arrowRelated Article


Report of STS Quality Measurement Task Force

The Society of Thoracic Surgeons 2008 Cardiac Surgery Risk Models: Part 1—Coronary Artery Bypass Grafting Surgery

David M. Shahian, MDa,*, Sean M. O'Brien, PhDb, Giovanni Filardo, PhD, MPHc, Victor A. Ferraris, MDd, Constance K. Haan, MDe, Jeffrey B. Rich, MDf, Sharon-Lise T. Normand, PhDg, Elizabeth R. DeLong, PhDb, Cynthia M. Shewan, PhDh, Rachel S. Dokholyan, MPHb, Eric D. Peterson, MD, MPHb, Fred H. Edwards, MDe, Richard P. Anderson, MDi,{dagger}

a Massachusetts General Hospital, Boston, Massachusetts
b Duke Clinical Research Institute, Durham, North Carolina
c Institute for Health Care Research and Improvement, Baylor Health Care System, Dallas, Texas
d University of Kentucky Chandler Medical Center, Division of Cardiovascular and Thoracic Surgery, Lexington, Kentucky
e University of Florida, Division of Cardiothoracic Surgery, Jacksonville, Florida
f Sentara Cardiovascular Research Institute, Norfolk, Virginia
g Department of Health Care Policy, Harvard Medical School, and Department of Biostatistics, Harvard School of Public Health, Boston, Massachusetts
h The Society of Thoracic Surgeons, Chicago, Illinois
i Seattle, Washington


Abbreviations and Acronyms BSA = body surface area; CABG = coronary artery bypass graft surgery; CHF = congestive heart failure; EF = ejection fraction; GFR = glomerular filtration rate; HCFA = Health Care Financing Administration; IABP = intra-aortic balloon pump; NYHA = New York Heart Association; NCD = National Adult Cardiac Surgery Database; O/E = observed to expected ratio; QMTF = Quality Measurement Task Force; STS = The Society of Thoracic Surgeons


* Address correspondence to Dr Shahian, Massachusetts General Hospital, 55 Fruit St, Boston, MA 02114 (Email: dshahian{at}partners.org).


Drs Shahian, O'Brien, Filardo, Ferraris, Haan, Rich, Normand, DeLong, Shewan, Peterson, Edwards, Anderson, and Ms Dokholyan have no conflicts of interest to declare regarding this work.

 

    Abstract
 Top
 Abstract
 Introduction
 Study Purpose
 Risk Model Development and...
 Study Population and Endpoints
 Selection of Candidate Predictor...
 Missing Data
 Preliminary Analyses for Ordinal...
 Specific Coding Decisions
 Final Variable Selection...
 Results
 Limitations
 Conclusions
 Footnotes
 References
 
Background: The first version of The Society of Thoracic Surgeons National Adult Cardiac Surgery Database (STS NCD) was developed nearly 2 decades ago. Since its inception, the number of participants has grown dramatically, patient acuity has increased, and overall outcomes have consistently improved. To adjust for these and other changes, all STS risk models have undergone periodic revisions. This report provides a detailed description of the 2008 STS risk model for coronary artery bypass grafting surgery (CABG).

Methods: The study population consisted of 774,881 isolated CABG procedures performed on adult patients aged 20 to 100 years between January 1, 2002, and December 31, 2006, at 819 STS NCD participating centers. This cohort was randomly divided into a 60% training (development) sample and a 40% test (validation) sample. The development sample was used to identify predictor variables and estimate model coefficients. The validation sample was used to assess model calibration and discrimination. Model outcomes included operative mortality, renal failure, stroke, reoperation for any cause, prolonged ventilation, deep sternal wound infection, composite major morbidity or mortality, prolonged length of stay (> 14 days), and short length of stay (< 6 days and alive). Candidate predictor variables were selected based on their availability in versions 2.35, 2.41, and 2.52.1 of the STS NCD and their presence in (or ability to be mapped to) version 2.61. Potential predictor variables were screened for overall prevalence in the study population, missing data frequency, coding concerns, bivariate relationships with outcomes, and their presence in previous STS or other CABG risk models. Supervised backwards selection was then performed with input from an expert panel of cardiac surgeons and biostatisticians. After successfully validating the fit of the models, the development and validation samples were subsequently combined, and the final regression coefficients were estimated using the overall combined (development plus validation) sample.

Results: The c-index for the mortality model was 0.812, and the c-indices for other endpoints ranged from 0.653 for reoperation to 0.793 for renal failure in the validation sample. Plots of observed versus predicted event rates revealed acceptable calibration in the overall population and in numerous subgroups. When patients were grouped into categories of predicted risk, the absolute difference between the observed and expected event rates was less than 1.5% for each endpoint. The final model intercept and coefficients are provided.

Conclusions: New STS risk models have been developed for CABG mortality and eight other endpoints. Detailed descriptions of model development and testing are provided, together with the final algorithm. Overall model performance is excellent.

In 1986, The Society of Thoracic Surgeons (STS) convened an Ad Hoc Committee on Risk Factors for Coronary Artery Bypass Graft Surgery (CABG) [1] and an Ad Hoc Committee to Develop a National Database for Cardiothoracic Surgery [2]. This was prompted by the release earlier that year of inadequately risk-adjusted hospital mortality data by the Health Care Financing Administration (HCFA), now the Centers for Medicare and Medicaid Services. Although the HCFA analytical methodology was widely criticized, STS leadership recognized that the underlying principle of collecting and analyzing data to improve patient outcomes was valid, particularly for complex and costly procedures such as coronary artery bypass grafting surgery (CABG). They believed that it was the responsibility of professional organizations to develop credible clinical data registries for their own specialties, and that risk models derived from such registries would circumvent many of the concerns resulting from the use of unadjusted administrative data. Such clinical registries would be used as credible data sources for quality assessment and improvement activities as well as for research.

These early activities ultimately led to the development of the STS National Adult Cardiac Surgery Database (NCD) [3, 4]. Since its release to members in 1990, the STS NCD has evolved to become one of the largest specialty-specific clinical data registries in the world. It currently has more than 950 participants enrolled, representing just under 90% of the cardiac surgery providers in the United States, with data on more than 3.6 million procedures. Similar STS data registries have now been developed for congenital heart surgery and general thoracic surgery, and future plans include the development of specialty modules (eg, quality metrics, atrial fibrillation surgery, thoracic aortic surgery). Recent enhancements, including the addition of unique physician and patient identifiers, will facilitate linkages with other registries and greatly expand the potential of the STS NCD for longitudinal follow-up, comparative effectiveness, and cost efficiency studies.

In addition to the development of the STS NCD as a comprehensive, nationally representative data registry, the second major goal of the STS was to assure that analyses derived from this registry would be appropriately adjusted for preoperative patient severity, a major deficiency of the HCFA reports that were initially published in 1986. This was accomplished by first identifying risk factors for specific procedures and outcomes, beginning with isolated CABG, then using these predictor variables to develop risk models. With statistical risk models, which are most often based on logistic regression, the expected outcome for a patient with a given set of risk factors can be determined, and that can be compared with the observed outcome. The observed (O) and expected (E) outcomes are summed over all patients of a particular surgeon or hospital to yield the risk-standardized mortality ratio (O/E), which can then be multiplied by the average rate in the reference population to calculate risk-standardized mortality rates [5–7].

STS CABG risk models have undergone periodic updates and revisions, the most recent of which was based upon 2000 to 2002 STS NCD data. In 2007, the STS Database Modernization Task Force completed a major specification upgrade of the STS NCD data collection instrument from version 2.52.1 to version 2.61. This included refinement, modification, consolidation, or elimination of some data elements, as well as an attempt to harmonize definitions with those of the American College of Cardiology National Cardiovascular Data Registry whenever possible. Given these changes, as well as the number of years since the last risk model update, the STS Quality Measurement Task Force (QMTF) was asked to develop new risk models for isolated CABG, isolated valve repair or replacement, and combined CABG plus valve procedures. The authors of this report include the QMTF members who participated in this initiative.

Implementation of these new models in January 2008 coincided with the release of STS NCD version 2.61. This report, Part 1 of 3, describes the development of the new mortality and morbidity models for isolated CABG surgery.


    Study Purpose
 Top
 Abstract
 Introduction
 Study Purpose
 Risk Model Development and...
 Study Population and Endpoints
 Selection of Candidate Predictor...
 Missing Data
 Preliminary Analyses for Ordinal...
 Specific Coding Decisions
 Final Variable Selection...
 Results
 Limitations
 Conclusions
 Footnotes
 References
 
The primary goal of this study was to develop risk-prediction algorithms for patients undergoing isolated CABG surgery. As the major intended use of these algorithms was to compare participant outcomes to the overall STS national experience, risk factors were generally restricted to patient and clinical characteristics present preoperatively.


    Risk Model Development and Transparency
 Top
 Abstract
 Introduction
 Study Purpose
 Risk Model Development and...
 Study Population and Endpoints
 Selection of Candidate Predictor...
 Missing Data
 Preliminary Analyses for Ordinal...
 Specific Coding Decisions
 Final Variable Selection...
 Results
 Limitations
 Conclusions
 Footnotes
 References
 
The availability of user-friendly statistical software programs and the exponential increase in computing speed have greatly facilitated statistical analyses such as logistic regression, the basis for many risk models. However, despite these technological advances, clinical judgment, experience, intuition, and practicality still play a critical role in risk model development. There are many points in model development at which legitimate differences in approach may lead to substantial differences in the resulting statistical models and the inferences derived from them [8].

We believe the degree of transparency provided in this report regarding the development of the STS CABG risk models is essential in today's health care environment. In an era when society demands full transparency regarding health care performance, the methodologies used to evaluate that performance should be just as transparent [9, 10]. This fundamental principle is among the standards established by the American Heart Association and American College of Cardiology for statistical models used for public reporting [11].


    Study Population and Endpoints
 Top
 Abstract
 Introduction
 Study Purpose
 Risk Model Development and...
 Study Population and Endpoints
 Selection of Candidate Predictor...
 Missing Data
 Preliminary Analyses for Ordinal...
 Specific Coding Decisions
 Final Variable Selection...
 Results
 Limitations
 Conclusions
 Footnotes
 References
 
All isolated CABG procedures performed on adult patients aged 20 to 100 years between January 1, 2000, and December 31, 2006, were initially considered for inclusion, although the final development and validation samples were derived from 2002 to 2006 data. Patients missing data on sex (n = 195) were excluded, as these patients are not included in STS performance feedback reports to database participants. That left a study population of 774,881 surgical procedures from 819 database participants. Patients on dialysis preoperatively (n = 12,415) were excluded when developing the risk model for postoperative renal failure.

Training and Validation Samples
The study population was randomly divided into a 60% training (development) sample and a 40% test (validation) sample. The development sample was used to identify predictor variables and estimate model coefficients. Data from the validation sample were used to assess model fit, discrimination, and calibration. After choosing variables and assessing model fit, the development and validation samples were subsequently combined, and the final model coefficients were estimated using the combined (development plus validation) data.

Endpoints
Risk models were developed for the nine endpoints listed below. Only mortality was recorded beyond the index hospitalization. Morbidity data included only in-hospital complications, although beginning in STS NCD version 2.61, sternal infections will be recorded for up to 30 days postoperatively. The nine endpoints are as follows: (1) operative mortality: death during the same hospitalization as surgery, regardless of timing, or within 30 days of surgery regardless of venue; (2) permanent stroke (cerebrovascular accident): a central neurologic deficit persisting longer than 72 hours; (3) renal failure: a new requirement for dialysis or an increase of the serum creatinine to more than 2.0 mg/dL and double the most recent preoperative creatinine level; (4) prolonged ventilation (longer than 24 hours); (5) deep sternal wound infection; (6) reoperation for any reason; (7) major morbidity or mortality: a composite defined as the occurrence of any of the above endpoints; (8) prolonged postoperative length of stay (PLOS): length of stay (LOS) more than 14 days (alive or dead); and (9) short postoperative LOS (SLOS): LOS less than 6 days and patient alive at discharge (this SLOS definition differs from the previous STS risk models, which did not exclude patients who died in-hospital; patients who died within 5 days of surgery are included in the new models but are treated as not having a short stay).

Table 1 summarizes the frequencies of these endpoints in the study population for each predictor variable category (ie, the bivariate relationships).


View this table:
[in this window]
[in a new window]

 
Table 1 Distribution of Risk Factors and Frequency of Adverse Outcomes in Overall Study Population, Isolated Coronary Artery Bypass Graft Surgery (2002–2006)
 

    Selection of Candidate Predictor Variables
 Top
 Abstract
 Introduction
 Study Purpose
 Risk Model Development and...
 Study Population and Endpoints
 Selection of Candidate Predictor...
 Missing Data
 Preliminary Analyses for Ordinal...
 Specific Coding Decisions
 Final Variable Selection...
 Results
 Limitations
 Conclusions
 Footnotes
 References
 
Initial Data Screening of Candidate Predictor Variables
We began by considering all possible candidate variables from the development set (Table 2). Because the primary goal of the STS risk models is to adjust surgical outcomes, in general only preoperative patient variables are included. However, because these models are also used for other purposes such as individual patient prediction and counseling, there were a few modifications (which are discussed in the relevant sections) in the application of this general principle.


View this table:
[in this window]
[in a new window]

 
Table 2 Initial List of Potential Candidate Variables
 
As there were a large number of procedures and endpoints available, we were not statistically constrained to highly parsimonious models, nor is such an approach generally favored in regression modeling [12–14]. Discarding valid data elements can waste valuable information that has been collected at substantial effort and cost. Furthermore, although much of the discrimination of a predictive model may be contained in a relatively small number of variables [15, 16], some predictor variables that add only modestly to discrimination may still be important predictors of outcomes at the patient level [17, 18].

Expert Panel Review for Clinical Relevance and Face Validity
All candidate variables available in version 2.61 were individually discussed by a panel of cardiac surgeons and health policy experts to assure that clinical relevance as well as multiple aspects of validity (face, construct, and content) had been considered.

Data Version for Model Development
Although these new risk models were to be introduced in conjunction with the release of STS NCD version 2.61, they were developed with data collected under the three previous data versions (2.35, 2.41, and 2.52.1) because no 2.61 data were yet available. The QMTF began its predictor selection process with two caveats. First, any candidate variable had to be collected consistently across the three previous data versions. Second, it had to also be available in version 2.61 or have the ability to be mapped to this new version. For example, history of smoking and renal failure were not candidate variables as they were either not included in, or were unable to be mapped to, version 2.61. Renal function is now assessed by the last preoperative serum creatinine value, which is collected in all data versions. Because the definition of hypercholesterolemia has changed substantially over successive STS data versions, and because counterintuitive results have been observed in some previous analyses of hypercholesterolemia, a decision was made not to include this variable in the new models.

Predictor Frequency
For each variable, the QMTF explored the overall prevalence and missing data frequency per year. Predictor variables that are rarely present in the development sample are difficult to model. For this reason, mitral (0.35%), tricuspid (0.08%), and pulmonic stenosis (0.06%), pulmonic insufficiency (0.10%), and endocarditis (0.09%) were not considered as variables in the new isolated CABG models.

Inconsistently Coded Variables
A few variables have been collected inconsistently or with questionable reliability, often for clinically unavoidable reasons. For example, pulmonary artery mean pressure data were missing for 70% of patients during 2002 to 2006. Furthermore, the value of this continuous variable may vary substantially depending on the clinical state and volume-loading status of the patient when the measurement is obtained. Because of these concerns, pulmonary artery pressure was not included in the models.

Derived or Redundant Variables
Several derived variables were considered for inclusion in the models. For example, body mass index (BMI) is a useful measure of overall body habitus. However, because BMI is highly correlated with body surface area (BSA), the more commonly used anthropometric measure in most previous STS models, the latter was retained in the new models. Similarly, there is a theoretical superiority to inclusion of glomerular filtration rate (GFR) rather than serum creatinine as a measure of renal function. However, the Modification of Diet in Renal Disease formula for estimating GFR is a complex function of creatinine, race, sex, and age, and not all laboratories perform this calculation automatically. Furthermore, as age, sex, and race are already model covariates, using GFR would complicate the interpretation of their regression coefficients. Some of the prognostic value of GFR comes from these variables that are already included in the model. Finally, previous studies suggest that various measures of renal function used in CABG mortality risk models have similar performance [19]. For all these reasons, serum creatinine was retained as the measure of renal function.

Controversial Variables
Race
Several variables raised particular clinical, statistical, or health policy issues. For example, race was an obvious candidate variable because it was a significant predictor (p < 0.001) of each endpoint except mortality and because the proportion of nonwhite patients varied substantially across institutions. In exploratory analyses, the association between race and outcomes persisted after adjusting for hospital identity, suggesting that this association is not explained by differences in hospital quality.

However, general principles of risk model development complicated the decision as to whether or not to include race in the models. When the dominant purpose of a risk model is adjustment of provider results, it is advisable to include only biological and clinical patient variables that are present before a patient's first contact with the provider. In this context, race is clearly a fixed biological characteristic, but its impact on patient outcomes may be mediated through other mechanisms. It is possible that certain racial and ethnic groups have worse outcomes not because of inherent biological characteristics but because of differences in the quality of care delivered to them. In this case, including race and ethnicity in a risk model could essentially select out or obscure the very disparity issues that society wishes to identify and correct. Inclusion of race and ethnicity in a risk model would say, in effect, that we expect nonwhites to have inferior results and would make an allowance for providers who care for such patients, just as we would for providers who care for patients in cardiogenic shock.

After deliberation regarding the pros and cons, the QMTF ultimately elected to retain race and ethnicity in the new models because of their impact on outcomes, while recognizing the potential limitations of this decision.

Preoperative intra-aortic balloon pump
Preoperative intra-aortic balloon pump (IABP) is a proxy for more serious preoperative status of the patient (eg, unstable angina, ventricular dysfunction). It captures information that may not be present in other data elements, and it is associated with higher risk of postoperative morbidity and mortality. For these reasons, most CABG risk models include preoperative IABP as a risk predictor. However, placement of an IABP is also a highly discretionary care process the frequency of which varies widely among participating institutions. Indications are subjective and are often dictated by the cardiologist before even referring the patient for cardiac surgery. Based on CABG risk models, an institution that liberally utilizes IABPs will have a higher expected risk of morbidity and mortality (according to the model) compared with another institution with a similar case-mix but a more restrictive IABP policy. That would impact their relative O/E ratios and risk-adjusted outcomes.

Despite its discretionary nature (and the potential for gaming), the QMTF decided to retain IABP use in the models because it is such an important predictor. Ultimately, it was elected to model preoperative IABP as a joint variable with preoperative inotrope use as an overall measure of preoperative acuity/severity.

Review of External Sources
The QMTF also reviewed multiple external resources to aid in the selection of potential candidate variables [15, 16, 20]. First, all previous versions of the STS CABG risk models were reviewed. The QMTF also examined other CABG risk models including the European System for Cardiac Operative Risk Evaluation (EuroSCORE) [21], the New York Cardiac Surgery Reporting System [22], the Veterans Affairs Administration cardiac surgery models [23, 24], and the Northern New England Cardiovascular Disease Study Group model [25, 26]. We particularly wanted to identify variables that were found in some form across all the risk models. Subject to the constraints of version 2.61 data specifications, we made a special effort to include such variables in the new STS risk models, in some instances requiring us to "force" them into the models, as described in the section on the final variable selection procedure.


    Missing Data
 Top
 Abstract
 Introduction
 Study Purpose
 Risk Model Development and...
 Study Population and Endpoints
 Selection of Candidate Predictor...
 Missing Data
 Preliminary Analyses for Ordinal...
 Specific Coding Decisions
 Final Variable Selection...
 Results
 Limitations
 Conclusions
 Footnotes
 References
 
Missing data in the STS NCD are rare, having a frequency of less than 1% for most variables. Candidate predictor variables missing most commonly were ejection fraction (5.5%), New York Heart Association (NYHA) class (4.7%), tricuspid insufficiency (3.9%), aortic insufficiency (3.7%), mitral insufficiency (3.1%), aortic stenosis (1.7%), and creatinine/dialysis (1.5%).

Missing predictor values in the STS NCD were managed using imputation. Multiple imputation is the generally preferred statistical method [27], but single imputation was also considered based on the following practical considerations: (a) the fraction of missing data in the STS NCD was small and, hence, single and multiple imputation would likely give similar point estimates; (b) a slight adjustment to the standard errors would not impact the study conclusions or the published risk algorithms; (c) the large sample size would make multiple imputation less practical to implement because of long computational times.

Prior to selecting an imputation strategy, exploratory analyses were performed using CABG data from 2002 to 2003 to compare single versus multiple imputation results for predicting mortality. These analyses confirmed that the choice between single versus multiple imputation would have only a slight impact on regression coefficients. For example, the estimated odds ratio for a 5-unit increase in ejection fraction was 0.90 (with a 95% confidence interval extending from 0.83 to 0.97) under single imputation and was 0.92 (with a confidence interval extending from 0.85 to 0.99) under multiple imputation. Other variables were missing less frequently than ejection fraction and were even less sensitive to the choice between single versus multiple imputation. Additional analyses of missing data consisted of reestimating the final model coefficients using single versus multiple imputation and comparing results. A summary of these investigations, as well as model coefficients and covariance matrices, are available at www.sts.org/riskmodels. For most patients, if risk were calculated using the multiple imputation model instead of single imputation, the relative change in their risk estimate would only be 1% to 2% (eg, 5% to 5.1% is a 2% change).

Based on the considerations described above, single imputation was used with the following specific rules: (1) binary (yes/no) risk factors were modeled as yes versus no or missing. Missing data for such variables usually implies their absence, and for most binary variables the composite event rates were similar for "no" and "missing" categories; (2) missing data on categorical predictor variables were imputed to the lowest risk value, which, in most instances, was the mode. In most instances, composite event rates for patients with missing data were among the lowest. It is the policy of the STS Data Warehouse and Analysis Center to discourage missing data through this default coding practice; and (3) missing data on continuous predictor variables were imputed to the conditional median. For ejection fraction, we conditioned on congestive heart failure (CHF) and sex. For BSA, we conditioned on sex. For serum creatinine, we conditioned on renal failure (although this approach will be modified when the model is ultimately applied to version 2.61 data, as renal failure has been removed).

For model endpoints (eg, mortality), missing data were handled by modeling yes versus no or missing. Thus, cases with missing data for an endpoint were analyzed as if the endpoint did not occur. Complete case analysis was not used because "missing" was not considered to be consistently coded for these variables. For example, some STS data managers have reported that they set complications to "no" unless there is explicit documentation in the medical record that the complication occurred. Other data managers may leave the field missing unless there is explicit documentation that the complication did not occur. Thus, missing data may reflect differences in coding practices rather than truly unknown or missing data.


    Preliminary Analyses for Ordinal Categorical Variables and Continuous Variables
 Top
 Abstract
 Introduction
 Study Purpose
 Risk Model Development and...
 Study Population and Endpoints
 Selection of Candidate Predictor...
 Missing Data
 Preliminary Analyses for Ordinal...
 Specific Coding Decisions
 Final Variable Selection...
 Results
 Limitations
 Conclusions
 Footnotes
 References
 
The QMTF conducted preliminary analyses to determine how best to model ordinal categorical variables and continuous variables. Categorical variables were entered into a logistic regression model by including a separate parameter for each category. Continuous variables were entered as piecewise linear functions (splines) with several changes of slope (knots). Terms were then removed one at a time using backward selection based on the Wald statistic. At each iteration, either two adjacent categories were collapsed into a single category or else two adjacent line segments were collapsed into a single line with no change of slope. The backward selection terminated when all adjacent categories and slopes were statistically different from one another at p < 0.001. This variable selection routine was performed separately for each endpoint. An expert panel determined the final coding based on the results of the backwards selection algorithm, supplemented by their clinical judgment and practical considerations. Table 3 summarizes these coding decisions.


View this table:
[in this window]
[in a new window]

 
Table 3 Final List of Candidate Variables and Coding For STS Risk Models
 

    Specific Coding Decisions
 Top
 Abstract
 Introduction
 Study Purpose
 Risk Model Development and...
 Study Population and Endpoints
 Selection of Candidate Predictor...
 Missing Data
 Preliminary Analyses for Ordinal...
 Specific Coding Decisions
 Final Variable Selection...
 Results
 Limitations
 Conclusions
 Footnotes
 References
 
Race and ethnicity
In versions 2.35, 2.41, and 2.52.1, race was collected by choosing one of the following mutually exclusive response categories: Caucasian, black, Hispanic, Asian, Native American, and other. In version 2.61, the data collection form was modified to conform to standards adopted by the US Census Bureau. It allows for selecting one or more races per patient (ie, select all that apply), and treats ethnicity (Hispanic versus non-Hispanic) as a separate variable. Because of these differences, the mapping of race among data versions is not straightforward.

Ultimately, the QMTF decided to model race as black, Asian, Hispanic, and Caucasian/other (collapsed). Initially, these categories will be mapped to version 2.61 as follows: (1) black will include all black patients, regardless of ethnicity or additional races; (2) Hispanic will include all nonblack Hispanic patients; (3) Asian will include all Asian patients who are not also identified as black or Hispanic; and (4) all remaining patients will be placed in the Caucasian/other category. The validity of this mapping will be assessed once 2.61 data become available and future versions could employ race "bridging" methodologies.

Body surface area
Height and weight were replaced by BSA, which was modeled as a quadratic trend to allow for a possible U-shaped relationship with outcomes (eg, extreme obesity and cachexia). This quadratic polynomial was modeled separately for males and females. Any BSA values below 1.4 or above 2.6 were mapped to these values respectively, which represent the approximate 1st and 99th percentiles of the empirical distribution.

Angina
Version 2.61 of the data collection form eliminates angina and substitutes a new variable called "cardiac presentation on admission," within which unstable angina is one of the possible response categories. The QMTF believed that unstable angina would be coded more consistently than any other angina class, and also that this was the most important type of angina presentation to include in the models. Angina coding was therefore restricted in the new risk models to "unstable angina without MI < 7 days (yes/no)." It was necessary to exclude patients with myocardial infarction less than 7 days because the new version 2.61 does not permit simultaneous coding of angina and acute myocardial infarction.

Reoperative status
The most important consideration with regard to reoperative status is the number of prior sternotomies, irrespective of the specific type of procedure performed. The revised models replaced prior CABG, prior valve, and prior "other" cardiac surgery with simply the number of previous cardiovascular surgeries.

Acuity status
The new models combine resuscitation with salvage status. By definition, all salvage patients should have resuscitation coded "yes."

Number of diseased coronary vessels
Outcomes are modeled using the number of diseased vessels (grouped as 0 or 1 versus 2 versus 3), as a linear effect across the three categories. This approach is consistent with the previous STS CABG models and was supported by the data.

NYHA class
Version 2.61 uses NYHA class as a subfield of CHF. The grouping of NYHA IV versus less than IV (I–III) classes is consistent with all existing STS models. The final categories were no CHF, CHF not NYHA IV, and CHF plus NYHA IV.

Age
Age was modeled as a linear spline with knots at ages 50 and 60 years.

Ejection fraction
Ejection fraction (EF) was modeled linearly, and EFs below 10% and above 50% were mapped to these values respectively. Only 0.03% of patients have EFs lower than 10%; such values are considered invalid and are treated like missing data. The coding decision regarding EF values above 50% was based on preliminary analyses in which the data were used to suggest the functional form of continuous variables.

Creatinine
Creatinine was modeled as a linear spline with knots at 1.0 and 1.5. Creatinine levels less than 0.5 or greater than 5.0 were mapped to these values respectively, which represent the approximate 1st and 99th percentiles of the empirical distribution.

Mortality and length of stay
The QMTF changed the previous STS definition of the "short postoperative length of stay (SLOS)" endpoint. The original definition did not specifically exclude early postoperative deaths, and such patients could have been inappropriately included with the remaining SLOS patients who had a particularly short and uncomplicated postoperative course. In the new models, patients who die within 5 days of surgery are included in the analysis but are not counted as a short stay.


    Final Variable Selection Procedure
 Top
 Abstract
 Introduction
 Study Purpose
 Risk Model Development and...
 Study Population and Endpoints
 Selection of Candidate Predictor...
 Missing Data
 Preliminary Analyses for Ordinal...
 Specific Coding Decisions
 Final Variable Selection...
 Results
 Limitations
 Conclusions
 Footnotes
 References
 
Backward Selection
Using the remaining candidate variables and the coding schemes described previously, a supervised backward selection approach was then performed. Initial variable selection used the Wald {chi}2 statistic with a significance criterion of 0.001. This high level of significance was chosen because of the very large sample size that resulted in quite small p values. An expert panel of cardiothoracic surgeons and biostatisticians then reviewed the selected variables and made several modifications. Measures of model performance (discrimination and calibration) were similar when all variables were retained in the models regardless of statistical significance or expert panel review.

Forced Variables
Several variables were included in the models regardless of statistical significance. These included all of the continuous variables (age, BSA, date of surgery [in 6-month intervals], creatinine, ejection fraction), plus sex and dialysis. In addition, atrial fibrillation was included a priori in the model for permanent stroke.

The rationale for including surgery date, a nonmodifiable variable of no intrinsic interest, was to adjust for changes in the frequency of adverse outcomes over the 5-year study period. We adjusted for surgery date to reduce potential confounding by time trends when estimating regression coefficients for variables that are of primary interest, such as preoperative clinical characteristics. For example, temporal changes in the frequency of coding for dyslipidemia, if they occur coincidentally with a secular declining trend in mortality rates, may lead to the unwarranted causal inferences unless there is adjustment for surgery date.

Date of surgery was categorized by 6-month intervals (corresponding to STS data harvests) and modeled as a linear trend across the ordinal categories. Surgery date is not included in the final risk algorithm and a patient's predicted risk is not dependent upon it. The intercept parameter published in the Appendix has been adjusted to incorporate the time trend, and it reflects the baseline risk for a reference period of July to December 2006.

Interaction Terms
These models focused on main effects, and the final models included only four sets of preselected variable interactions: (1) sex by BSA; (2) sex by BSA squared; (3) age by reoperation; (4) age by emergent status. More extensive investigation for interactions was considered, including nonlinear, machine-learning approaches. However, the incremental value of such approaches remains uncertain [28], and interpretability can also become more problematic with numerous interaction terms.

Although multiple terms were allotted for modeling the main effects of age and reoperation, only a single degree of freedom was allotted for their interaction. The models defined a single variable interaction term for age and reoperation. It was equal to the patient's age minus 50 if the patient was at least 50 years old and had a previous CV surgery; otherwise it was equal to zero. This term represents the difference in the change of the slope of age at age 50 for patients who have had at least one previous CV surgery compared with patients who have not had a previous CV surgery. Similarly, only one degree of freedom was allotted for the interaction between age and status. The interaction represents the difference in the change of the slope of age at age 50 for patients with emergent or salvage status compared with patients with elective or urgent status. Although these interaction terms complicate the interpretation of other model variables, this was considered to be acceptable because the main focus of the analysis was prediction, not effect estimation.


    Results
 Top
 Abstract
 Introduction
 Study Purpose
 Risk Model Development and...
 Study Population and Endpoints
 Selection of Candidate Predictor...
 Missing Data
 Preliminary Analyses for Ordinal...
 Specific Coding Decisions
 Final Variable Selection...
 Results
 Limitations
 Conclusions
 Footnotes
 References
 
Model Performance
Table 4 presents the discrimination of each of the isolated CABG models as well as a comparison with the previous STS CABG risk models. For the new CABG models, discrimination ranged from 0.657 to 0.810 in the development sample and from 0.653 to 0.812 in the validation sample. The close agreement between c-indices from the development and validation samples reflects the large sample size and suggests that the models did not overfit the data. When the discrimination of the new and previous STS models were compared using the validation sample, the c-index of the new model was larger for each endpoint.


View this table:
[in this window]
[in a new window]

 
Table 4 Discrimination of Models (C-Index)
 
The Hosmer-Lemeshow test is not reported as an overall measure of calibration for these models because of its sensitivity to sample size. With samples as large as those used to develop these models, the null hypothesis will inevitably be proven false, given that all such models are only approximations [29]. As an alternative to such global measures of calibration, Figure 1 shows plots of observed versus expected event proportions within deciles of predicted risk for a variety of endpoints. For each endpoint, the absolute difference between the observed and expected proportions was less than 1.5% in each decile category. Additional analyses of model fit and discrimination are available online at www.sts.org/riskmodels.


Figure 1
View larger version (33K):
[in this window]
[in a new window]

 
Fig 1. Plots of observed (O) versus expected (E) in validation sample

 
Final Models
After calculating these measures of model performance, the final regression coefficients were estimated from the combined training and validation samples. Odds ratios for each predictor variable and model endpoint are summarized in Table 5. "Not applicable" indicates that the specific predictor was not included in a particular risk model. These final models were estimated using generalized estimating equations with empirical (sandwich) standard error estimates to account for clustering of patients within institutions [30]. An independence working correlation matrix was used to apply the generalized estimating equations method. With this approach, the estimated regression coefficients were identical to those obtained using ordinary logistic regression, but the standard errors were adjusted to account for correlated observations within hospitals.


View this table:
[in this window]
[in a new window]

 
Table 5 Estimated Odds Ratios for CABG Mortality, Morbidity, and Length of Stay Models
 
Final Model Intercept and Coefficients
The Appendix contains the algorithm, intercept and coefficients for the final STS 2008 CABG risk models. The variance/covariance matrix is available on the web at www.sts.org/riskmodels. An on-line risk calculator is available at http://209.220.160.181/STSWebRiskCalc261/.

Previously, the STS risk models were completely upgraded every 3 years, with annual recalibration in the interim to assure that the benchmark O/E ratio is always 1. In the near future, annual upgrades of the models are planned.


    Limitations
 Top
 Abstract
 Introduction
 Study Purpose
 Risk Model Development and...
 Study Population and Endpoints
 Selection of Candidate Predictor...
 Missing Data
 Preliminary Analyses for Ordinal...
 Specific Coding Decisions
 Final Variable Selection...
 Results
 Limitations
 Conclusions
 Footnotes
 References
 
Regardless of sample size or degree of statistical sophistication, all risk models are imperfect representations of reality. Although the STS risk models are based upon excellent clinical data and large sample sizes, there are some risk factors that are rare in the overall population but, when present, may be important predictors of outcome for specific patients. Some such variables, such as liver disease, are not included in the risk models, and the mortality risk for patients with these risk factors may be underestimated. Addition of a number of such variables will be considered at the next major specification upgrade.

There are other variables whose specifications undergo small but important changes over time, often in response to comments from STS database participants. These refinements are discussed on regular biweekly conference calls open to database participants, and suggested changes are regularly communicated to participants through a variety of means including FAQ's. With each major specification upgrade, they are incorporated into the new software specifications.

Audit is extremely important to assure the accuracy of any data registry. For the STS database and the risk models derived from it, robust audit is particularly critical as this registry is increasingly used for public reporting of outcomes and pay for performance. Studies suggest that the accuracy of the STS database is high for most important variables [31–35], although these audits are currently restricted to a limited number of sites annually because of budgetary constraints. In these audits, one of the most problematic variables has been 30-day mortality status (as opposed to in-hospital mortality). This is often a difficult endpoint to ascertain and may require more substantial investment of time and effort by participants, particularly for patients referred from outside their own institutions. Analysis of STS data suggests that approximately 90% of 30-day deaths occur in-hospital. Thus, if some patients recorded as being alive at 30 days have actually had their status ascertained only during the index hospitalization, the impact of this misclassification on the risk models should be negligible. This hypothesis was confirmed by comparing the odds ratios of all model variables for in-hospital versus 30-day mortality. Differences between the two were quite small, and these data are available on the web at www.sts.org/riskmodels. A new risk model for in-hospital mortality has been developed and placed on the same STS website. Furthermore, an aggressive program is in place to further enhance the accuracy of 30-day follow-up. In 2009, STS instituted a requirement that participants maintain documentation of the method by which they ascertained 30-day status, and that has become part of our routine audit. Linkage of the STS database with external death registries, such as the Social Security Death Master File, will further support this capability. Finally, plans are being developed to expand the audit of certain key variables such as 30-day mortality to a significantly greater number of sites annually.


    Conclusions
 Top
 Abstract
 Introduction
 Study Purpose
 Risk Model Development and...
 Study Population and Endpoints
 Selection of Candidate Predictor...
 Missing Data
 Preliminary Analyses for Ordinal...
 Specific Coding Decisions
 Final Variable Selection...
 Results
 Limitations
 Conclusions
 Footnotes
 References
 
Risk-adjustment models account for the effect of patient comorbidities on outcomes. STS risk models are based upon clinical data from the STS NCD, one of the oldest and largest of all specialty registries. The value of such clinical registries is particularly evident in today's health care environment, where accreditation, regulatory compliance, reimbursement, and referrals are increasingly based upon objective data. Organizations such as the AQA and the National Quality Forum that evaluate and endorse performance measures strongly advocate the use of risk-adjusted outcomes measures.

STS believes that clinical data are superior to those derived from administrative sources. Furthermore, given the substantial implications of risk-adjusted outcomes, we believe that all risk models used for profiling quality of care should be transparent to permit comprehensive peer review and to foster credibility among stakeholders.

We present a detailed exposition of the development and validation of the 2008 STS CABG risk model. This describes not only the statistical considerations but, just as importantly, the many clinical and pragmatic judgments that are always necessary in risk model development.


    Appendix
 
Regression Coefficients and Variable Definitions for STS 2008 CABG Models
For each endpoint, the formula for calculating a patient's predicted risk of the endpoint has the form:


Formula

where x 1, x 2, ... , xn denote patient preoperative risk factors (eg, quantitative variables such as age, and comorbidities coded as 1 = present, 0 = absent), and β0, β1, ... , βn denote regression coefficients (numerical constants). Regression coefficients for each endpoint are presented in Appendix Table 1. The variables x 1, x 2, ... , xn are the same for each endpoint and are defined in Appendix Table 2. The regression coefficient for the time trend is not presented. Instead, the intercept has been adjusted to incorporate the time trend. This adjusted intercept reflects the baseline risk for a reference period of July to December 2006.


View this table:
[in this window]
[in a new window]

 
Appendix Table 1 Regression Coefficients
 

View this table:
[in this window]
[in a new window]

 
Appendix Table 2 Definition of Variables Appearing in STS 2008 CABG Models
 


    Footnotes
 Top
 Abstract
 Introduction
 Study Purpose
 Risk Model Development and...
 Study Population and Endpoints
 Selection of Candidate Predictor...
 Missing Data
 Preliminary Analyses for Ordinal...
 Specific Coding Decisions
 Final Variable Selection...
 Results
 Limitations
 Conclusions
 Footnotes
 References
 
{dagger} This author is deceased. Former Chair, Quality, Research and Patient Safety Council, The Society of Thoracic Surgeons, Chicago, IL. Back


    References
 Top
 Abstract
 Introduction
 Study Purpose
 Risk Model Development and...
 Study Population and Endpoints
 Selection of Candidate Predictor...
 Missing Data
 Preliminary Analyses for Ordinal...
 Specific Coding Decisions
 Final Variable Selection...
 Results
 Limitations
 Conclusions
 Footnotes
 References
 

  1. Kouchoukos NT, Ebert PA, Grover FL, Lindesmith GG. Report of the Ad Hoc Committee on Risk Factors for Coronary Artery Bypass Surgery Ann Thorac Surg 1988;45:348-349.[Abstract]
  2. Clark RE. It is time for a national cardiothoracic surgical data base Ann Thorac Surg 1989;48:755-756.[Medline]
  3. Edwards FH. Evolution of the Society of Thoracic Surgeons National Cardiac Surgery Database J Invasive Cardiol 1998;10:485-488.[Medline]
  4. Grover FL, Shroyer AL, Hammermeister K, et al. A decade's experience with quality improvement in cardiac surgery using the Veterans Affairs and Society of Thoracic Surgeons national databases Ann Surg 2001;234:464-472.[Medline]
  5. Shahian DM, Blackstone EH, Edwards FH, et al. Cardiac surgery risk models: a position article Ann Thorac Surg 2004;78:1868-1877.[Abstract/Free Full Text]
  6. Shahian DM, Normand SL, Torchiana DF, et al. Cardiac surgery report cards: comprehensive review and statistical critique Ann Thorac Surg 2001;72:2155-2168.[Abstract/Free Full Text]
  7. Normand S-LT, Shahian DM. Statistical and clinical aspects of hospital outcomes profiling Stat Sci 2007;22:206-226.
  8. Naftel DC. Do different investigators sometimes produce different multivariable equations from the same data? J Thorac Cardiovasc Surg 1994;107:1528-1529.[Free Full Text]
  9. Iezzoni LI. "Black box" medical information systems. A technology needing assessment. JAMA 1991;265:3006-3007.[Abstract/Free Full Text]
  10. Shahian DM, Hutter MM, Torchiana DF, Iezzoni LI. Transparency: a mandatory requirement for risk models J Am Coll Surg 2008;206:1240-1242.
  11. Krumholz HM, Brindis RG, Brush JE, et al. Standards for statistical models used for public reporting of health outcomes: an American Heart Association scientific statement from the Quality of Care and Outcomes Research Interdisciplinary Writing Group: cosponsored by the Council on Epidemiology and Prevention and the Stroke CouncilEndorsed by the American College of Cardiology Foundation Circulation 2006;113:456-462.[Abstract/Free Full Text]
  12. Breiman L. Statistical modeling: the two cultures Stat Sci 2001;16:199-231.
  13. Harrell Jr FE. Regression modeling strategies with applications to linear models, logistic regression, and survival analysisNew York: Springer-Verlag; 2001.
  14. Vittinghoff E, Glidden DV, Shiboski SC, McCulloch CE. Regression methods in biostatistics linear, logistic, survival, and repeated measures modelsNew York: Springer-Verlag; 2005.
  15. Jones RH, Hannan EL, Hammermeister KE, et al. Identification of preoperative variables needed for risk adjustment of short-term mortality after coronary artery bypass graft surgeryThe Working Group Panel on the Cooperative CABG Database Project J Am Coll Cardiol 1996;28:1478-1487.[Abstract]
  16. Tu JV, Sykora K, Naylor CD. Assessing the outcomes of coronary artery bypass graft surgery: how many risk factors are enough?Steering Committee of the Cardiac Care Network of Ontario J Am Coll Cardiol 1997;30:1317-1323.[Abstract]
  17. Cook NR. Use and misuse of the receiver operating characteristic curve in risk prediction Circulation 2007;115:928-935.[Abstract/Free Full Text]
  18. Pencina MJ, D'Agostino Sr RB, D'Agostino Jr RB, Vasan RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond Stat Med 2008;27:157-172.[Medline]
  19. Cooper WA, O'Brien SM, Thourani VH, et al. Impact of renal dysfunction on outcomes of coronary artery bypass surgery: results from the Society of Thoracic Surgeons National Adult Cardiac Database Circulation 2006;113:1063-1070.[Abstract/Free Full Text]
  20. Grunkemeier GL, Zerr KJ, Jin R. Cardiac surgery report cards: making the grade Ann Thorac Surg 2001;72:1845-1848.[Free Full Text]
  21. Nashef SA, Roques F, Michel P, Gauducheau E, Lemeshow S, Salamon R. European system for cardiac operative risk evaluation (EuroSCORE) Eur J Cardiothorac Surg 1999;16:9-13.[Abstract/Free Full Text]
  22. Hannan EL, Kilburn Jr H, O'Donnell JF, Lukacik G, Shields EP. Adult open heart surgery in New York State. An analysis of risk factors and hospital mortality rates. JAMA 1990;264:2768-2774.[Abstract/Free Full Text]
  23. Grover FL, Johnson RR, Marshall G, Hammermeister KE. Factors predictive of operative mortality among coronary artery bypass subsets Ann Thorac Surg 1993;56:1296-1306.[Abstract]
  24. Grover FL, Shroyer AL, Hammermeister KE. Calculating risk and outcome: the Veterans Affairs database Ann Thorac Surg 1996;62(Suppl):6-11.
  25. O'Connor GT, Plume SK, Olmstead EM, et al. A regional prospective study of in-hospital mortality associated with coronary artery bypass graftingThe Northern New England Cardiovascular Disease Study Group JAMA 1991;266:803-809.[Abstract/Free Full Text]
  26. O'Connor GT, Plume SK, Olmstead EM, et al. Multivariate prediction of in-hospital mortality associated with coronary artery bypass graft surgeryNorthern New England Cardiovascular Disease Study Group Circulation 1992;85:2110-2118.[Abstract/Free Full Text]
  27. Little RJA, Rubin DB. Statistical analysis with missing data2nd ed.. Hoboken: Wiley-Interscience; 2002.
  28. Austin PC. A comparison of regression trees, logistic regression, generalized additive models, and multivariate adaptive regression splines for predicting AMI mortality Stat Med 2007;26:2937-2957.[Medline]
  29. Marcin JP, Romano PS. Size matters to a model's fit Crit Care Med 2007;35:2212-2213.[Medline]
  30. Liang KY, Zeger SL. Longitudinal data-analysis using generalized linear-models Biometrika 1986;73:13-22.[Abstract/Free Full Text]
  31. Grover FL, Shroyer AL, Edwards FH, et al. Data quality review program: the Society of Thoracic Surgeons Adult Cardiac National Database Ann Thorac Surg 1996;62:1229-1231.[Free Full Text]
  32. Shroyer AL, Edwards FH, Grover FL. Updates to the Data Quality Review Program: the Society of Thoracic Surgeons Adult Cardiac National Database Ann Thorac Surg 1998;65:1494-1497.[Abstract/Free Full Text]
  33. Herbert MA, Prince SL, Williams JL, Magee MJ, Mack MJ. Are unaudited records from an outcomes registry database accurate? Ann Thorac Surg 2004;77:1960-1964.[Abstract/Free Full Text]
  34. Welke KF, Peterson ED, Vaughan-Sarrazin MS, et al. Comparison of cardiac surgery volumes and mortality rates between the Society of Thoracic Surgeons and Medicare databases from 1993 through 2001 Ann Thorac Surg 2007;84:1538-1546.[Abstract/Free Full Text]
  35. Welke KF, Ferguson Jr TB, Coombs LP, et al. Validity of the Society of Thoracic Surgeons National Adult Cardiac Surgery Database Ann Thorac Surg 2004;77:1137-1139.[Free Full Text]

Related Article

The Society of Thoracic Surgeons 2008 cardiac surgery risk models: introduction.
and
Ann. Thorac. Surg. 88: S1-S1. [Full Text]




This Article
Right arrow Abstract Freely available
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Right arrow Citation Map
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Add to Personal Folders
Right arrow Download to citation manager
Right arrow Author home page(s):
David M. Shahian
Sean M. O'Brien
Giovanni Filardo
Victor A. Ferraris
Constance K. Haan
Jeffrey B. Rich
Sharon-Lise T. Normand
Cynthia M. Shewan
Rachel S. Dokholyan
Eric D. Peterson
Fred H. Edwards
Richard P. Anderson
Right arrow Permission Requests
Google Scholar
Right arrow Articles by Shahian, D. M.
Right arrow Articles by Anderson, R. P.
PubMed
Right arrow Articles by Shahian, D. M.
Right arrow Articles by Anderson, R. P.
Related Collections
Right arrow Cardiac - other
Right arrow Education
Right arrowRelated Article


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
ANN THORAC SURG ASIAN CARDIOVASC THORAC ANN EUR J CARDIOTHORAC SURG
J THORAC CARDIOVASC SURG ICVTS ALL CTSNet JOURNALS