|
|
||||||||
a Duke University Medical Center, Durham, North Carolina
b The Congenital Heart Institute of Florida (CHIF) and Cardiac Surgical Associates (CSA), All Childrens Hospital and Childrens Hospital of Tampa, University of South Florida, Saint Petersburg and Tampa, Florida
c Childrens Hospital, Denver, Colorado
d Memorial Hospital Childs Health Centre, Warsaw, Poland
e St. Christophers Hospital for Children, Philadelphia, Pennsylvania
f Wayne State University School of Medicine, Detroit, Michigan
g Montreal Childrens Hospital, Montreal, Quebec, Canada
h Division of Cardiothoracic Surgery, Oregon Health and Science University, Portland, Oregon
i Childrens Memorial Health Institute, Warsaw, Poland
j Policlinico Universita, Padova, Italy
k Childrens Memorial Hospital, Chicago, Illinois
l Freeman Hospital, Newcastle Upon Tyne, United Kingdom
m The Childrens Hospital of Philadelphia, Philadelphia, Pennsylvania
n Royal Liverpool Childrens Hospital Alder Hey, Liverpool, United Kingdom
o Childrens Hospital Heart Institute, Denver, Colorado
Accepted for publication June 8, 2007.
* Address correspondence to Dr OBrien, Box 17969, Duke Clinical Research Institute, Durham, NC 27715 (Email: obrie027{at}mc.duke.edu).
Presented at the Forty-third Annual Meeting of The Society of Thoracic Surgeons, San Diego, CA, Jun 29–31, 2007.
| Dr Jeffrey P. Jacobs discloses that he has a financial relationship with CardioAccess.
|
| Abstract |
|---|
|
|
|---|
Methods: Data from the European Association of Cardiothoracic Surgery (EACTS) congenital database (17,838 operations, 56 centers) and the Society of Thoracic Surgeons (STS) congenital database (18,024 operations, 32 centers) were analyzed. Discrimination of the ABC score for predicting in-hospital mortality and postoperative length of stay (PLOS) of more than 21 days was quantified by the C statistic. Procedure-specific rates of mortality and prolonged PLOS were compared with predictions from a logistic regression model, and an exact binomial test was used to identify procedures that were mortality and morbidity outliers.
Results: There was a significant positive correlation between the ABC score of a procedure and its observed procedure-specific risk of mortality (C = 0.70) and prolonged PLOS (C = 0.67). Several individual procedures were identifed as mortality and morbidity outliers.
Conclusions: The ABC score generally discriminates between low-risk and high-risk congenital procedures making it a potentially useful covariate for case-mix adjustment in congenital heart surgery outcomes analysis. Planned revisions of the ABC score will incorporate empirical data and will benefit from the large sample sizes of the STS and EACTS databases.
The evaluation of the quality of care delivered to patients with congenital heart disease relies heavily on the analysis of outcomes. Yet, the analysis of congenital heart surgery outcomes is challenging owing to the large number of surgical procedures that vary in complexity. One method that has been proposed for complexity-adjusted outcomes analysis is known as the Aristotle Basic Complexity Score (ABC score) [1–3].
The ABC score expresses the case complexity of congenital heart surgery procedures based on three components: the potential for mortality, the potential for morbidity, and the technical difficulty of the procedure. The grading of individual procedures was subjectively determined and represents the consensus opinion of 50 surgeon experts. Since 2002, this methodology has been used by both the Society of Thoracic Surgeons (STS) and the European Association of Cardiothoracic Surgery (EACTS) in their yearly analysis and reporting of outcomes for a current aggregate of more than 80,000 operations [4–7].
The utility of the ABC score depends on its ability to correctly classify procedures according to their potential for morbidity, mortality, and technical difficulty. Although the difficulty of a procedure is inherently subjective and difficult to validate, the accuracy of the ABC score with respect to mortality and morbidity can be objectively determined for procedures with adequate sample size. Previous efforts to determine the mortality and morbidity potential of congenital heart procedures were hindered by the large number of procedures with small patient sample sizes. Since 1998, however, the STS and EACTS databases have grown substantially. We combined these two large multiinstitution databases to assess how well the ABC score predicts the actual morbidity and mortality potential of 131 congenital heart surgery procedures.
| Material and Methods |
|---|
|
|
|---|
The STS and EACTS Congenital Databases identify procedures by a common nomenclature published by the International Congenital Heart Surgery Nomenclature and Database Project [9, 10]. Participation in the EACTS and STS Congenital Databases is voluntary. An ongoing audit program assesses data accuracy in the EACTS database [6, 7, 11], and similar efforts are planned for the STS database. Linking of data elements in STS and EACTS is possible owing to the use of a common nomenclature and compatible data elements.
Patient Population
The study population consisted of patients undergoing cardiovascular operations who were admitted to hospitals participating in the STS or EACTS databases between January 1, 2002, and December 31, 2004. The data for 568 patients from two hospitals were excluded because these hospitals did not report discharge mortality during the study period. An initial data set was created by including all operations coded as type "CPB" (cardiopulmonary bypass) or "No-CPB cardiovascular" and excluding operations of type "thoracic," "ECMO" (extracorporeal membrane oxygenation), "interventional cardiology," or "other." To avoid double-counting mortality, only the first operation for each admission was retained. From the resultant data set (initial operations of a hospitalization that were CPB or No-CPB cardiovascular), operations were selected if they involved one of the cardiovascular procedures (listed in Table 1) for which the ABC score is defined.
|
In addition to the cardiovascular procedures listed in Table 1, the ABC score is also defined for 13 noncardiovascular procedures (Table 2) that were excluded from the analysis owing to their noncardiovascular focus. The decision to exclude noncardiovascular procedures was made prospectively before the data were analyzed.
|
The final study population consisted of 18,024 operations from 32 centers in the STS database and 17,838 operations from 56 centers in the EACTS database for a total of 35,862 operations.
End Points
The study focused on two endpoints: (1) in-hospital mortality, defined as death during the same hospitalization as the operation regardless of cause; and (2) prolonged postoperative length of hospital stay (PLOS) defined as a PLOS exceeding 21 days. Prolonged PLOS was regarded as a very general proxy measure of morbidity. Other measures of morbidity had high rates (>10%) of missing data (eg, length of mechanical ventilation time and complications such as stroke, renal failure, and heart block) or are not tracked by the database (eg, intensive care unit LOS) and were therefore not analyzed.
Aristotle Scoring System
In creating the ABC scoring system, 145 congenital procedures were subjectively graded on three components: mortality potential, morbidity potential, technical difficulty. Each component received a score of between 0.5 and 5 points. The ABC was defined as the sum of the three components: overall ABC score = mortality component + morbidity component + technical difficulty component. The overall ABC score ranges from 1.5 to 15 points, with larger scores indicating greater overall complexity.
Classification of Multiple-Procedure Operations
Operations were classified according to the 131 procedure types listed in column 1 of Table 1. Operations involving two or more procedures done concurrently were assigned to the procedure having the highest ABC score. In case of a tie, the operation was assigned to the procedure that was designated as the primary procedure by the surgeon performing the operation. If no procedure was designated as primary, or if two or more procedures were designated as primary, the tie was broken arbitrarily by assigning the first procedure listed in the database. This arbitrary assignment of primary procedure designation to the first procedure listed in the database occurred 96 times out of 35,862 operations (0.27%).
Statistical Analysis
The overall strength of the association between the ABC score and each outcome was assessed graphically by plotting the observed procedure-specific rates of each outcome as a function of the ABC score. For each procedure, the observed mortality rate was calculated as the number of deaths divided by number patients with nonmissing mortality status. Similarly, the observed rate of prolonged PLOS was calculated as the number of patients with PLOS exceeding 21 days divided by the number of patients with nonmissing LOS. According to this definition, patients who died in the hospital before 21 days were not considered to have a prolonged stay. To ensure that the graphic results were not dominated by sampling variation, the graph was restricted to 83 procedures having at least 50 cases in the combined STS/EACTS data set (97.4% of patient operations; n = 34,927). All other analyses reported were based on all procedures, regardless of sample size.
The discrimination of the ABC score as a predictor of mortality and prolonged stay was quantified by calculating the area under the receiver operating characteristic (ROC) curve [13], or C statistic, as determined by univariable logistic regression. The C statistic represents the probability that a randomly selected patient who had the outcome of interest (ie, died or stayed more than 21 days) had a higher predicted risk of the outcome compared with a randomly selected patient who did not experience the outcome. The C statistic is generally 0.5 to 1.0, with 0.5 representing no discrimination (ie, a coin flip) and 1.0 representing perfect discrimination.
To allow for a possible nonlinear association between the ABC score and outcomes, the logistic regression model included ABC score as both a linear term (ABC score) and quadratic term (ABC score squared). These analyses were initially conducted in the overall study population and subsequently repeated in the subgroup of patients undergoing single-procedure operations.
In addition to assessing the discrimination of the ABC score, we also assessed the discrimination of the Aristotle mortality component as a predictor of mortality and the discrimination of the Aristotle morbidity component as a predictor of prolonged PLOS.
Finally, procedure-specific rates of mortality and prolonged PLOS were compared with predictions from the logistic regression model to identify individual procedures that violate the ideal assumption that risk can be predicted by the ABC score. Procedure-specific mortality rates of each outcome were tabulated for each procedure along with an exact two-sided 95% binomial confidence interval. A two-sided exact binomial test was used to test the null hypothesis that the true procedure-specific risk is equal to the value predicted by the logistic regression model.
| Results |
|---|
|
|
|---|
|
|
|
| Comment |
|---|
|
|
|---|
This collaborative study was conducted to assess the accuracy of the ABC score based on prospectively collected outcomes data using the combined resources of the STS and EACTS databases. As anticipated, we found a significant positive correlation between the ABC score of a procedure and the procedures actual empirically determined risk for mortality and prolonged stay. Because the ABC score is correlated with both procedural case mix as well as outcomes, stratification based on this score should reduce confounding when outcomes are compared across institutions with differing case mix. In other words, the ABC score contains useful information that can be used in the context of risk-adjustment and complexity-adjustment.
Although an overall positive correlation existed between the ABC score and outcomes, there was not a perfect one-to-one increasing relationship between the ABC score of a procedure and its empirically determined potential for mortality and morbidity. Some technically straightforward procedures that are often performed on quite ill patients had actual mortality rates that were much higher than predicted from the ABC score; for example, "PA (pulmonary artery) banding," with a predicted mortality of 2.4% and an actual observed mortality rate of 8.5% (p < 0.01), and "ASD (atrial septal defect) creation/enlargement," with a predicted mortality rate of 1.4% and an actual observed mortality of 16.9% (p < 0.01). Meanwhile, other technically challenging procedures that are often performed on relatively healthy patients had actual mortality rates that were much lower than expected from the ABC score; for example, "Ross procedure," with an expected mortality of 7.8% and an actual observed mortality of 1.9% (p < 0.01).
As a result, 19 procedures had mortality rates that were significantly (p < 0.05) less than expected from the ABC score, and 20 procedures had mortality rates that were significantly greater than expected. These findings indicate that there may be an opportunity to improve the accuracy of the ABC system in future refinements by incorporating objective data.
Other investigators have used empirical data to assess the predictive value of the ABC score, and some of these studies have made comparisons with a related risk-stratification methodology: the Risk Adjustment for Congenital Heart Surgery (RACHS-1) system [15]. In a sample of 13,675 patients from one institution, Al-Radi and colleagues [16] found that both the ABC score and RACHS-1 levels were predictive of in-hospital mortality, but the RACHS-1 levels had better discrimination (C = 0.70 versus 0.73; p = 0.02). The C statistic reported by Al-Radi for the ABC score was identical to the value that we obtained using STS/EACTS data (each C = 0.70). In another single-institution study (n = 1085), Kang and colleagues [17] found that ABC and RACHS-1 were both significantly associated with mortality, but the RACHS-1 levels had a larger logistic regression Wald statistic of 17.7 versus 4.8. Each study found RACHS-1 to be a better predictor of mortality, but the ABC score could be calculated and applied to a larger percentage of each hospitals congenital operations (96% versus 84% in Al-Radi and colleagues; 98% versus 92% in Kang and colleagues).
The study results reported in this article should be interpreted in the context of certain limitations and methodologic considerations: First, the imperfect correlation between the ABC score and outcomes may partly reflect that the ABC score includes a technical complexity component that is not intended to predict mortality or morbidity. To address this consideration, we also analyzed the Aristotle mortality and morbidity components separately and found that the results were generally similar.
Second, although a large database was assembled for this analysis, several procedures have relatively small sample sizes, and the actual mortality and morbidity rates of these procedures may be estimated with error. To address this limitation, the data were also analyzed using Bayesian parametric and nonparametric random effects models that account for uncertainty in the estimated risk of procedures with small sample sizes. These more complicated analyses did not change the study results and were omitted for brevity and simplicity.
Third, the STS and EACTS registries are voluntary and are not necessarily representative of all centers.
Fourth, although prolonged stay was used as a proxy measure of morbidity, a long hospital LOS does not necessarily imply the occurrence of morbidity. For example, a neonate will often stay in the hospital longer than 21 days simply because of feeding or social issues. As mentioned previously, other direct measures of morbidity were not analyzed because they were not collected or had high rates of missing data.
Finally, patients often undergo multiple planned or unplanned operations during a single hospital admission. As a result, the risk associated with the first operation of the admission is confounded by the effect of subsequent operations. In other words, discrepancies between the ABC score and actual observed mortality and morbidity rates might reflect the mortality and morbidity risk caused by additional operations done during the same hospital stay.
A related methodologic issue pertains to the analysis of operations that include multiple component procedures. Although we speak of validating the ABC score, the validation actually involves the combination of the ABC score plus a rule for assigning ABC scores to operations that comprise multiple procedures. The rule used in this analysis is to define the ABC score of an operation to be the maximum of the ABC scores of all the component procedures. This rule was selected because it is currently used by the STS and EACTS database committees for creating feedback reports. In other words, the ABC score was assessed according to the way it is currently used in practice. Although STS and EACTS reports use the maximum, many other rules are possible and could be investigated.
The originator of the ABC score, François Lacour-Gayet, has led the development of an Aristotle Comprehensive Complexity score. This comprehensive version of the Aristotle score accounts for patient-specific factors such as age, birth weight, and comorbidities and considers the added complexity of multiple procedures done concomitantly [18–20]. In the current analysis, because the performance of the ABC score could depend on the rule used to assign ABC scores to operations, all of our analyses were repeated in the subset of single-procedure operations.
Important issues related to the ABC score were not addressed in this study. To keep the focus narrow, we restricted attention to quantifying the predictive accuracy of the ABC score. We did not explicitly evaluate its fitness for any particular purpose, such as making interinstitution comparisons. However, because the main purpose of the ABC score is to serve as a tool for complexity adjustment when making interinstitution comparisons, a validation study that focuses specifically on this purpose would be of interest. The developers of the ABC score proposed a relatively simple formula for making interinstitution comparisons from the ABC score. In addition to this approach, a variety of statistical methods based on the ABC score could be proposed and evaluated. Ultimately, the utility of the ABC score should be assessed by comparing complexity-adjustment methods that incorporate the ABC score with other methods that do not incorporate the ABC score.
In conclusion, the ABC score generally discriminates between low-risk and high-risk congenital procedures, making it a potentially useful covariate for case-mix adjustment in congenital heart surgery outcomes analysis. Because the ABC score is correlated with procedural risk and case mix, stratifying on this variable should reduce confounding when comparing outcomes across institutions with differing case mix. Further research is needed to determine the best method of performing interinstitution comparisons by using the ABC score.
Although our study indicated that some individual procedures are scored incorrectly, we did not propose any specific revisions to the ABC scoring system in this present study. These revisions will be handled as a separate project and will rely on a combination of objective data and subjective assessments (for procedures with inadequate sample size). Future research planned by the Aristotle developers will focus on the following six areas:
| Discussion |
|---|
|
|
|---|
DR OBRIEN: The basic score does not include patient-specific factors, no. There is a comprehensive version that will be validated, and that one does include patient-related factors.
DR PIGULA: Okay. Because your speculation is, by not including those, a very simple procedure incurs a high risk because of other variables you are not measuring currently. Is that correct?
DR OBRIEN: Right.
DR OSMAN AL-RADI (Toronto, Ontario, Canada): Thank you for presenting this data.
I have two questions. One, what was the error rate in the recording of mortality and length of stay in the two databases? Second, did the performance of ABC varied between the two databases?
DR OBRIEN: The rates of missing data in the STS congenital database have decreased dramatically over time, so that when we did the analysis, we actually restricted attention to sites that had fewer than 10% missing data for discharge mortality. There were 2 sites that needed to be excluded because they didnt meet that data quality threshold. For the remaining sites, we found 195 patients with missing discharge mortality data and 40 patients with missing length of stay data.
We didnt specifically look at comparing outcomes in the EACTS database and the STS database. In general, from past analyses, we know that there is some variation in outcomes across centers and this variation is not entirely explained by case mix. In some analyses not reported here, we adjusted for this between-center variation and the results were similar. We did not go further to explain the variation between centers and we did not specifically compare outcomes across the two databases.
DR FRANÇOIS G. LACOUR-GAYET (Denver, CO): I would like to make a couple of comments on this excellent work done by Sean OBrien at DCRI. Prediction of mortality does not necessarily validate this type of score; however, the score must reflect the reality. To solve this issue, we are going to move from subjective probability to objective data. You have to understand that it was impossible in 2000 to use objective data because there were not enough data available in the congenital databases. To "start the pump," so to say, we had to use subjective probability. We are going to stop that.
Today in 2007, we have gathered nearly 100,000 patients putting together the STS and EASCTS databases and we are able to rely on objective data. We will soon produce a score of mortality and a score of morbidity that will be entirely based on objective data. It will not predict mortality because one cannot predict mortality for an individual patient or an individual surgeon. It will allow comparing yourself to an average value of the database, knowing if you are above or below this average number. The real goal of the Aristotle Score is to evaluate performance. I believe that this is a major goal for this Society. As was well explained this morning by Dr Grover, we have to evaluate performance and move toward pay-for-performance.
This instrument, the Aristotle Score, and more exactly, the Comprehensive Score that will include all the specificities of the patient, should be able to measure precisely performance. This is the aim of this research project. I want to congratulate you again, Sean, for this outstanding work.
DR WILLIAM I. DOUGLAS (Lexington, KY): Just a comment on the 21-day length of stay. I would agree that it is a pretty good surrogate for a complicated postoperative course in an elective patient 6 months old or older. But for a lot of the neonatal repairs, transpositions, and hypoplasts and the like, 21 days would represent more than the median in my experience, but it is probably still within a standard deviation of the median. As far as I am concerned, it doesnt quite represent a good surrogate for a complicated post-op course in a neonate.
DR OBRIEN: Maybe not for all patients, and there may be some misclassification there. That is something as a topic that Jeff Jacobs can respond to because that is something we addressed in the discussion section of our manuscript.
DR JEFFREY P. JACOBS (St. Petersburg, FL): Yes, Sean, I think that Dr Douglas is absolutely right with this point. For this particular study, postoperative length of stay of more than 21 days was the best surrogate for morbidity that we could create. We now have a large MultiSocietal Database Committee for Pediatric and Congenital Heart Disease that involves surgeons, cardiologists, anesthesiologists, and intensive care doctors. This MultiSocietal Committee is trying to create a methodology within the STS database that will have better surrogates for morbidity than the one we used in the study because we agree exactly with the point that Dr Douglas has raised.
So, "postoperative length of stay of more than 21 days" is what we have to work with now, but we are hoping that within a year, we have a better estimate and tool to quantify morbidity in our database. To be honest, it is crucially important that we develop this tool, because one of the functions of the database is to function as a tool to evaluate the quality of care, and if we just focus on mortality, we measure the quality of care in only 4% of the patients. We must have good tools to estimate morbidity, so that we can measure the quality of care delivered to the surviving 96% of our patients. I guess the best answer to your question is that the topic that you raise is exactly what we are working on right now.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
S. M. O'Brien, D. R. Clarke, J. P. Jacobs, M. L. Jacobs, F. G. Lacour-Gayet, C. Pizarro, K. F. Welke, B. Maruszewski, Z. Tobota, W. J. Miller, et al. An empirically based tool for analyzing mortality associated with congenital heart surgery. J. Thorac. Cardiovasc. Surg., November 1, 2009; 138(5): 1139 - 1153. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. M. DeCampli and R. P. Burke Interinstitutional comparison of risk-adjusted mortality and length of stay in congenital heart surgery. Ann. Thorac. Surg., July 1, 2009; 88(1): 151 - 156. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. P. Jacobs, R. J. Cerfolio, and R. M. Sade The Ethics of Transparency: Publication of Cardiothoracic Surgical Outcomes in the Lay Press. Ann. Thorac. Surg., March 1, 2009; 87(3): 679 - 686. [Full Text] [PDF] |
||||
![]() |
Y. D. Durandy, M. Younes, and B. Mahut Pediatric Warm Open Heart Surgery and Prolonged Cross-Clamp Time Ann. Thorac. Surg., December 1, 2008; 86(6): 1941 - 1947. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Sinzobahamvya, M. Boscheinen, H. C. Blaschczok, R. Kallenberg, J. Photiadis, C. Haun, V. Hraska, and B. Asfour Survival and reintervention after neonatal repair of truncus arteriosus with valved conduit Eur. J. Cardiothorac. Surg., October 1, 2008; 34(4): 732 - 737. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Li, G. Zhang, H. Holtby, S. Cai, M. Walsh, C. A. Caldarone, and G. S. Van Arsdell Significant correlation of comprehensive Aristotle score with total cardiac output during the early postoperative period after the Norwood procedure J. Thorac. Cardiovasc. Surg., July 1, 2008; 136(1): 123 - 128. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |