|
|
||||||||
a Division of Cardiothoracic Surgery, Department of Surgery, Oregon Health and Science University, Portland, Oregon
b Division of Cardiac Surgery, Department of Surgery, University of Michigan School of Medicine, Ann Arbor, Michigan
c Division of Pediatric Cardiothoracic Surgery, Department of Surgery, Case Western Reserve University, Cleveland, Ohio
Accepted for publication August 20, 2009.
* Address correspondence to Dr Welke, Division of Cardiothoracic Surgery L353, Oregon Health and Science University, 3181 SW Sam Jackson Park Rd, Portland, OR 97239-3098 (Email: welkek{at}ohsu.edu).
Presented at the Forty-fifth Annual Meeting of The Society of Thoracic Surgeons, San Francisco, CA, Jan 26–28, 2009. Winner of the J. Maxwell Chamberlain Memorial Award for Congenital Heart Surgery.
| Abstract |
|---|
|
|
|---|
Methods: Pediatric cardiac surgical operations performed at U.S. hospitals were identified in the Nationwide Inpatient Sample (NIS) Database 2000 to 2005 (21,709 operations from 161 hospitals). Hospital annual surgical volumes and in-hospital mortality rates for Risk Adjustment for Congenital Heart Surgery, version 1 (RACHS-1) categories and select individual operations were calculated. The actual case volumes were compared with thresholds necessary to detect a doubling and a 5 percentage point increase in the mortality rate.
Results: No hospital had a sufficient annual case volume to determine a doubling of or 5 percentage point increase in mortality for any individual operation and a minority (0% to 5.6%) had sufficient volume to detect these differences for specific RACHS-1 categories. Minimum hospital case volumes needed to detect a doubling of mortality from a benchmark ranged from 11 for RACHS-1 category 5 to 2,935 for RACHS-1 category 1. Minimum case volumes necessary to detect a 5 percentage point difference in mortality between two hospitals ranged from 173 for RACHS-1 category 1 to 1,483 for RACHS-1 category 5. Five hundred twenty-five patients were needed to detect a doubling of overall hospital mortality rate compared with another hospital. Only 1.6% (n = 4) of hospitals met this minimum caseload.
Conclusions: Pediatric cardiac surgery operations are either performed too infrequently or have mortality rates that are too low to allow valid hospital quality comparisons to be based on mortality.
| Introduction |
|---|
|
|
|---|
While adult cardiac surgery was the initial focus of public reporting efforts, pediatric cardiac surgery is increasingly being included. However, there are important differences between the two specialties that make comparisons of pediatric cardiac surgery mortality rates problematic. In order to detect statistically relevant differences, adequate sample sizes and event rates are essential [6]. Pediatric cardiac surgical operations are performed relatively infrequently at individual hospitals and in general the mortality rates are low. The combination of small sample sizes and low event rates limits the statistical power of a comparison of an individual hospital's mortality rate to either a mean or benchmark or to that of another hospital. As a result, important differences may not be identified; a type II error.
Our hypothesis for this investigation is that pediatric cardiac surgical case volumes and mortality rates are too low to allow the use of mortality to differentiate between hospitals. We will obtain actual hospital annual case volumes and national mean mortality rates from a national administrative database and test our hypothesis using both 1-tailed and 2-tailed tests.
| Material and Methods |
|---|
|
|
|---|
Congenital cardiac surgical procedures performed on patients under 18 years of age were identified by International Classification of Diseases, 9th Revision, Clinical Modification (ICD-9-CM) diagnosis and procedure codes. In order for a patient to be included in this study the procedure code had to match to a plausible diagnostic code. Operations were categorized by the Risk Adjustment for Congenital Heart Surgery, version 1 (RACHS-1-1) method [8]. This risk stratification system groups the varied congenital cardiac surgical case mix into six categories based on similar expected short-term mortality rates. Category 1 has the lowest risk of death and category 6 the highest. Category 1 contains atrial septal defect repair, patent ductus arteriosis closure on patients over 30 days of age, and aortic coarctation repair on patients over 30 days of age. Category 2 contains operations such as ventricular septal defect closure, pulmonary valve replacement, total repair of tetralogy of Fallot, and Glenn shunt. Category 3 includes operations such as aortic valve replacement, Fontan procedure, and arterial switch. Category 4 includes complex neonatal surgery such as repair of transposition of the great arteries with ventricular septal defect and repair of truncus arteriosis. Category 5 includes repair of neonatal Epstein's anomaly and repair of truncus interrupted arch. Category 6 includes the Norwood procedure and the Damus-Kaye-Stansel procedure. The RACHS-1 method is a widely used risk stratification methodology for congenital heart surgery [8–12]. The methodology has been validated and is included in the Society of Thoracic Surgeons Congenital Heart Surgery Database reports [5]. Selected individual operations were also identified by ICD-9-CM codes.
Mortality was defined as in-hospital mortality as indicated by the discharge disposition. Hospital volume was defined as the number of operations performed in a year. If a hospital appeared in the NIS in 2 or more years, each entry was treated separately. Hospital volume was not averaged over the 6 year period because all hospitals were not sampled in all years and it is possible that personnel and systems changes occurred over this time that impacted volume and quality of care.
The actual annual hospital case volume for each RACHS-1 category and selected individual operation was compared with thresholds necessary to detect a statistically significant (as measured by 95% confidence intervals) doubling and a 5 percentage point increase in the mortality rate. Both 1-tailed and 2-tailed t tests were used. A 1-tailed test is appropriate when the hypothesis states the direction of the difference or relationship. For example, a 1-tailed test would be appropriate for testing whether or not a hospital's mortality rate was significantly worse than a mean or benchmark. A 2-tailed test is used when the direction of the difference or relationship is not specified. For example, a 2-tailed test would be used to determine whether or not the mortality rate at one hospital was significantly different, either higher or lower than the mortality rate of another hospital. Analyses were performed using SAS version 9.1 (SAS, Inc, Cary, NC).
| Results |
|---|
|
|
|---|
|
|
|
|
|
|
| Comment |
|---|
|
|
|---|
The numbers of individual operations performed annually were too small for quality comparisons. Even when operations were aggregated into groups (RACHS-1 categories) or totaled for each hospital, the sample sizes were still too small. Because mortality rates for the majority of pediatric cardiac surgical operations are low, grouping dilutes the influence of more infrequent, higher mortality operations such as the Norwood procedure and results in the need for a sample size more similar to a low complexity operation. There are additional problems with grouping operations. When dissimilar operations are grouped together, information is lost. Although there are some system level similarities that affect patients who undergo different operations, there are factors specific to each operation that are crucial to achieving a good outcome. By grouping vastly different operations together, the ability to investigate operation specific factors is lost. Larger sample sizes can be achieved by collecting data over longer periods of time. However, over time not only do personnel and systems at hospitals change, but treatments change as well. As a result, the utility of the collected data is reduced.
Our thresholds of a doubling of or 5 percentage point increase in mortality were chosen as targets that most individuals would agree represent clinically significant quality differences between programs. As one often wishes to detect finer gradations of difference, these thresholds may seem excessive. However, as the disparity in mortality rates one wishes to detect is reduced, the volume of cases needed to detect a difference increases. In order to detect a 1.5 times or 2 percentage point difference in mortality, case volumes would have to be even larger than the thresholds we used for a doubling or 5 percentage point increase. Given our findings, such comparisons cannot be made with current annual hospital volumes.
Another way of addressing the problem of small sample size is to investigate outcomes that occur more frequently than mortality, such as postoperative complications. At present, this is complex for several reasons. Unlike a relatively objective outcome, such as mortality, many complications are more subjective. Collection of complications data, either by administrative coders or clinical personnel, is time consuming and errors are likely to occur. In addition, definitions of complications are less standard than for mortality and collected data are more difficult to validate. Although more common than mortality, most complications still occur infrequently and may be associated with only one or a subset of operations. If complications are grouped into composite scores, or a surrogate for complications such as length of stay is used, the ability to link the resulting data to specific clinical issues or processes of care is reduced. Outcomes such as quality of life and neurologic status are experienced by all patients, alleviating the problem with small event rates. However, at present, due in part to cost and practicality, such data are not widely available.
Rather than a focus on outcomes, an alternative approach to determining quality is to look at structural components and process measures [13]. In order for the chosen measures to be valuable they must impact outcome and be modifiable. It is of no use to scrutinize a process measure that does not impact outcome, nor is it useful to consider a structural measure that cannot be changed. Unfortunately, the numbers of structural components and process measures demonstrated to influence outcome in congenital cardiac surgery are small and those that are known are often procedure specific.
We chose to use the RACHS-1 method to classify operations for this investigation. The RACHS-1 was developed to compare the mortality for groups of patients undergoing congenital cardiac surgery. The RACHS-1 method is a widely used risk stratification methodology for congenital heart surgery [8–12]. The methodology does not allow for conditions that may be risk factors for specific operations. However, for this analysis, RACHS-1 was used for categorization only, not risk adjustment. While risk adjustment is essential for accurate comparison of mortality rates, it is secondary to sample size. If there are too few cases available for a valid comparison to be made, the risk adjustment methodology employed is irrelevant.
Our analysis has several limitations. First, although the NIS is the largest all-payer inpatient care database in the United States, it is a sample and not a complete database of all hospital discharges. As a result, although it is designed to be representative of national practice, there is the possibility for error. However, the NIS is the largest collection of real data that can currently be used for addressing the present question. Second, the NIS is an administrative database. Administrative data were designed for claims data collection and billing, not heath care research, and can be limited by erroneous coding of congenital diagnoses [14, 15]. As a result, we may have miscategorized some cases. However, given the large sample size, random miscoding would be unlikely to significantly influence our findings. We reduced error from miscoding of data in this study by only including patient records in which the procedure code matched to a plausible diagnostic code.
Notwithstanding these limitations, our study was conducted using a large, national dataset. This gave us adequate power to generate current, stable mortality rates. Because the NIS is a sample of all hospitals performing congenital cardiac surgery in the United States, rather than a select sample that has chosen to participate in a voluntary database, the mortality rates and hospital case volumes are likely an unbiased representation of the overall practice in the country.
Presently available data and current statistical methodology limit the use of mortality as a metric of quality in pediatric cardiac surgery. The structure of the United States health care system, with a large number of low volume pediatric cardiac centers, contributes to the difficulty in assessing quality [16]. The majority of hospital mortality rates are clustered, or regressed, around a mean value. Adequate sample size is therefore critical to allow discrimination; both identification of hospitals performing exceptionally well and those performing poorly. Regionalization of pediatric cardiac care or regional collaboration among centers not only might improve outcomes by concentrating experience, but also would facilitate quality assessment by increasing surgical volumes [17]. Performance metrics other than mortality may provide a more meaningful measurement of quality. The reliance on mortality alone to represent quality of pediatric cardiac surgery misleads health care providers and hospitals who may incorrectly assume they are providing quality care and patients and their families who are falsely reassured.
| Discussion |
|---|
|
|
|---|
First, I would not completely abandon the concept of mortality measurement, as there are many adjuncts and refinements that may enhance its utility. For example, hierarchical models are increasingly used in outcomes profiling to provide better estimates of true performance when sample sizes are small. Two graphical methods have also been utilized to supplement traditional mortality reports. Sequential, real-time mortality monitoring using CUSUM [cumulative sum] plots may provide early warning of deteriorating performance, and congenital heart surgery was a noteworthy early application of this method. Funnel plots graph mortality against procedural volumes. Their superimposed funnel-shaped confidence intervals focus attention on the increased random variability associated with small sample size.
Effective performance monitoring should also encompass other measures in addition to mortality. The Institute of Medicine advocates more comprehensive multidimensional measures of quality, and this principle motivated development of the STS CABG [coronary artery bypass grafting] composite score, consisting of 11 NQF [National Quality Forum]-endorsed measures. In congenital heart surgery, I have listed a few of the many potential outcomes measures that might be utilized in addition to mortality, as well as several structure and process measures. Such measures could be used individually, or preferably aggregated into a composite together with mortality. In addition to providing a more comprehensive perspective on quality, such composites effectively increase the number of endpoints and thus statistical power.
Notwithstanding the incremental value of all these approaches, there is no escaping the fact that mortality will always remain a central component of surgical quality and assurance, which brings me to my question for Dr Welke.
Karl, in studies last year you demonstrated that many hospitals in the U.S. perform very low volumes of congenital procedures, and that, in contrast to CABG surgery, there appears to be a significant volume outcome association, at the very least for the more complex cases. Today you have shown that it would be almost impossible to effectively monitor mortality for low and very low volume programs. Viewed in aggregate, do your findings over the last several years suggest a role for volume thresholds or regionalization in congenital heart surgery, especially for more complex cases? Thank you.
DR WELKE: Thank you, Dr Shahian, for your kind comments and your question. I agree with everything that you have said. The issue of volume is a touchy one. In the papers that we wrote last year, we did see a relationship between volume and mortality when appropriately adjusted for risk. An important caveat to the thresholds in those papers is that although in aggregate large volume hospitals performed better than small volume hospitals, there were many small volume hospitals that performed equally well, at least equally well given the limits of our current data and methodology, relative to large volume hospitals. So it is difficult to use the thresholds that we had in our papers as absolute cutoffs.
What is noteworthy from those papers, however, is the number of very small volume hospitals doing congenital heart surgery. When we looked at the Nationwide Inpatient Sample, for instance, 188 out of the 307 hospitals performed less than 20 cases per year. Even in the Society of Thoracic Surgeons Congenital Heart Surgery Database, 15 of the 48 hospitals performed under 150 cases per year. So there are a lot of very small volume hospitals out there.
The aggregation of those cases into larger centers may have an effect on mortality. This was looked a couple of years ago using California administrative data from 1995 to 1997. Theoretically, 24% of congenital cardiac surgical deaths could be eliminated by moving cases from small volume to large volume hospitals. So consolidation and regionalization may be attractive not only from an economy of scale perspective, but also to reduce mortality. What the actual volume threshold should be, however, is still to be determined.
DR PETER MCKEOWN (Pikeville, KY): I want to compliment Dr Welke on a very pertinent and a very well presented paper. The difficulty you presented with mortality is the fact that it is a discrete variable. It is an all or none phenomenon. And so I guess I have got a couple of questions. Number one is, did you look at the observed-to-expected (O/E) outcomes such as used in the STS and NSQIP [National Surgical Quality Improvement Program] databases and how did that influence the data?
The second thing that you brought out very nicely is the need for us as societies to find ways to benchmark and share, particularly now that in many cases we are being judged by claims data, which is not the same as outcome data or performance data. So that is the second thing.
And then the third thing is, how can we measure competency as a measure of performance and quality and did you think about that in your review of the data? Thank you very much.
DR WELKE: Thank you for your comments. With regard to your first question, we did not look at observed-to-expected ratios. In fact, these numbers are not adjusted. There is no need in this case. When making mortality comparisons, one must first have an adequate number of patients in the sample. Then risk adjustment can be addressed. The former is necessary before the latter is a concern. Our analysis addresses sample size.
With regard to measurement of competency, there is going to be board certification in congenital heart surgery very soon. Individuals are currently going through fellowship training in preparation for that examination and currently practicing surgeons will have the opportunity to become certified as well. So there will be a measurement of surgeon competency, at least at the certification level, from the American Board of Thoracic Surgery. That will be an important innovation. I am interested to see the changes in the practice of congenital cardiac surgery across the country over the next 10 to 15 years that result from the designation of a group of certified surgeons.
DR MCKEOWN: And then the other part was how do we share the benchmark data, how do we get better by sharing competencies or outcomes as the New England group has done?
DR WELKE: Our specialty is different than adult cardiac surgery in that there are a lot fewer of us and our centers are more widely dispersed. In some ways that facilitates collaboration because, in general, we have fewer competition issues with programs across town or in the state. What we need to do is to take the regional quality improvement model and move it to a national quality improvement model. That may be led by this Society, it may be led by the Congenital Heart Surgeons Society, but that would be a real addition to what we do.
DR DOUGLAS E. WOOD (Seattle, WA): I have one additional question related to your answers to both discussants. Dr Shahian and you discussed the aspect of regionalization of care, which is an automatic conclusion of this type of report, but you have also pointed out the wide geographic separation of congenital centers. Although it seems logical that there would be decreased mortality with regionalization of care, can you comment on the unintended consequences of regionalization, more specifically, the potential lack of access to care that you have just alluded to in the wider geographic distribution of congenital centers?
DR WELKE: Thank you, Dr. Wood. That is an excellent point. There is always going to be a compromise as to where you draw the line. It is easy in a large city to say from a patient care perspective, "we are going to reduce the number of centers." It is more difficult where you and I live, in the Northwest, to say there will only be congenital cardiac care in Seattle and Denver, for instance. Interestingly, in the California study that I mentioned earlier, if all the patients at hospitals that did under 70 cases per year were moved to hospitals that did over 170 cases per year, the average added travel distance was 12.7 miles. So there are a lot of areas where we could regionalize without an increased distance component. However, any regionalization plan should take into account and minimize the longer travel distances and related negative impacts, such as family stress and potential delays in treatments, which patients and families in more sparsely populated areas might experience.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
M. Nathan, J. M. Karamichalis, H. Liu, P. del Nido, F. Pigula, R. Thiagarajan, and E. A. Bacha Intraoperative adverse events can be compensated by technical performance in neonates and infants after cardiac surgery: A prospective study J. Thorac. Cardiovasc. Surg., November 1, 2011; 142(5): 1098 - 1107.e5. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. E. Oster, M. J. Strickland, and W. T. Mahle Impact of prior hospital mortality versus surgical volume on mortality following surgery for congenital heart disease J. Thorac. Cardiovasc. Surg., October 1, 2011; 142(4): 882 - 886. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. F. Welke Interpreting Congenital Heart Disease Outcomes: What Do Available Metrics Really Tell Us? World Journal for Pediatric and Congenital Heart Surgery, July 1, 2010; 1(2): 194 - 198. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Chu and F. G. Bakaeen Reply Ann. Thorac. Surg., July 1, 2010; 90(1): 358 - 359. [Full Text] [PDF] |
||||
![]() |
W. G. Williams Congenital heart disease: interrelation between German diagnoses-related groups system and Aristotle complexity score Eur J Cardiothorac Surg, June 1, 2010; 37(6): 1276 - 1277. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |