|
|
||||||||
Ann Thorac Surg 2001;72:169-175
© 2001 The Society of Thoracic Surgeons
Address reprint requests to Dr Stark, Great Ormond Street Hospital for Children, NHS Trust, London WCIN 3JH, England
e-mail: jarda{at}freeuk.com
Presented at the Thirty-seventh Annual Meeting of The Society of Thoracic Surgeons, New Orleans, LA, Jan 2931, 2001.
| Abstract |
|---|
|
|
|---|
Methods. Data relating to all operations (2,718) carried out at the five centers during a period from April 1, 1997 through March 31, 1999. Clearly defined criteria were agreed for the classification of patients into various subgroups.
Results. The overall hospital mortality was 4.4% (95% confidence intervals 3.7%5.3%). Mortality for open operations was 12.6% in neonates, 5.1% in infants, and 3.5% in children. Mortality rates were 1.1% for coarctation, 0.4% ventricular septal defect, 4.1% atrioventricular septal defect, 2.9% Fallot, 0.9% switch, and 15.6% truncus arteriosus. Although individual surgeons mortality rates ranged from 1.8% to 7.5%, none of the 12 surgeons data were above 95% confidence intervals. For individual surgeons, the change in mortality rates between the 2 years ranged between -3.3% and +3.8%.
Conclusions. With 2 years data available, estimates of mortality rates are more precise as reflected by tighter confidence intervals. There were relatively small data sets for individual hospitals and surgeons, which made statistical evaluation difficult. For setting standards, data from more departments for a longer period will be required. Statistical methods alone cannot be used as a sole arbiter of what is considered acceptable performance.
| Introduction |
|---|
|
|
|---|
Obviously, the events surrounding this affair have sparked considerable debate among cardiac surgeons. It was recognized that, in the absence of standards for mortality rates and methods for assessment of results in pediatric cardiac surgery (PCS), identification of divergent performance was difficult.
To investigate the current mortality rates in PCS, five UK centers agreed in 1997 to participate in a long-term study that aimed to gather information about the results of all operations for CHD. The findings from the first year of the study have already been published [1]. Here, we report experience from 2 years. Mortality rates for all operations, for selected groups of operations, and for individual departments and surgeons were estimated in three age groups. In addition, we have investigated methods for comparing changes in mortality rates from one year to the next and discussed methods that could facilitate assessment of surgeons performance.
| Material and methods |
|---|
|
|
|---|
The presented analysis includes data for the 2-year period April 1, 1997 through March 31, 1999. Each center provided results of operations for the three age groups: neonates (0 to 28 days), infants (29 days to 1 year) and children (older than 1 year of age). No upper age limit was set; thus the data include a few older patients operated for CHD. The definitions and classifications used in the study were as follows:
The six "marker operations" were selected arbitrarily to reflect procedures of different complexity; they included repair of isolated coarctation of the aorta, closure of single or multiple ventricular septal defect (VSD), repair of Fallots tetralogy, repair of complete atrioventricular septal defect (AVSD), arterial switch operation for "simple" transposition of the great arteries (TGA), and repair of truncus arteriosus with or without interrupted aortic arch. Detailed inclusion and exclusion criteria adopted for the purposes of this study were described previously [1].
Each center provided summary tables of results that were collated centrally for statistical analysis. As with any statistical analysis there was imprecision in the estimate of mortality. We illustrated this imprecision in mortality rate estimates by plotting exact 95% confidence intervals (CI) against sample size for annual case loads typical in this study, using the overall mortality rate from all five centers as a single base line estimate of mortality. This method was used for illustrative purposes only and did not represent a formal statistical hypothesis test.
The 95% CI for the difference in mortality rates for one year to the next were calculated using a method described by Newcombe [2] based on the Wilson [3] scoring method. Technical details of this calculation are available on the Internet [4].
| Results |
|---|
|
|
|---|
|
|
|
|
|
The numbers of marker operations performed by individual surgeons in the 2-year period remained small and made assessment of results difficult. As Table 4 shows, one death could result in a mortality rate of 2%, 10%, 50%, or even 100%, depending on the size of the group.
|
|
|
| Comment |
|---|
|
|
|---|
The additional data available following the second year of this study has improved our knowledge in terms of the precision of mortality rate estimates. The estimates for all operations, and for open and closed operations became more precise, as reflected by the reduced width of the 95% CI (Fig 2).
The availability of the additional data has also reduced the width of 95% CI for all six marker operations. However, the number of operations remained small in each group and the 95% CI continued to be rather wide. The results of the six marker operations illustrate some of the problems associated with preparation of standards even for the most common operations. As the numbers of procedures in each category performed by an individual department in one year are small, preparation of meaningful standards would require aggregation of data over a longer time period or over more centers.
The results of repair of truncus arteriosus show the difficulties in evaluating mortality rates for rare operations. The mortality rate was 28.6% in the first year and 5.6% in the second year, resulting in a mortality rate of 15.6% for the 2-year period. Each estimate has wide 95% CI. Such results are not useful to clinicians, because they make advising parents about the risk of the proposed operation difficult. They are also not helpful for the assessment of surgeons performance.
The topic of neonatal cardiac surgery has received a lot of attention during recent years, yet reliable information about the numbers and results of operations in neonates has been sparse. Figure 1 shows that the estimate for mortality rate for the neonatal group is considerably higher than that for other age groups. In itself, this finding is perhaps not surprising. Unlike operations in infants and children, in which many operations can be performed electively, most operations in neonates are urgent, life-saving procedures, which cannot be delayed beyond the neonatal period. It should be stressed that case mix factors, which may play an important role in influencing mortality, have not been taken into account within the present study. The extent of this cannot be assessed from this study. Deeper evaluation of outcomes in this age group requires larger numbers and a more sophisticated approach.
Now that data are available for individual surgeons for the 2 consecutive years, an interesting question to be asked is whether systematic differences in mortality rates are discernible from one year to the next. With the small sample size, the statistical methods available for computing differences in mortality rates and associated confidence intervals are technically demanding and require specialized statistical techniques [24] not yet included in standard texts on medical statistics. Having used such methods, the finding was that the 95% CI were wide (Fig 4). They are thus difficult to use as a basis for judging a change in performance. This finding illustrates the difficulty in detecting systematic differences in mortality for individual surgeons, even if differences were to exist. The 2 surgeons whose mortality increased the most (surgeons J and I) both happened to have below-average mortality in the first year of the study, and the evidence in Figure 4 certainly does not establish a real decline in their performance.
Consider a hypothetical surgeon, performing 150 operations a year with a mortality of say 4%, which is close to the groups average. If such a surgeon did suffer a real decline in performance, then his or her mortality would have to more than double before the change would be detectable purely by statistical comparison of mortality data over 2 successive years.
Another statistical problem in relation to performance assessment concerns the notion of testing differences in mortality, or indeed examining the divergence of one surgeons mortality figures from the national average. If performance is assessed based on 95% CI, then a competent surgeon whose performance is actually average, stands a 5% chance that his or her mortality data in a given year might appear anomalous. In other words, there is a 1 in 20 chance of statistical performance assessment recording a falsely positive indication of poor performance. Given that most surgeons would expect to be operating over the course of a 20-year career or longer, this means that they should expect to be under suspicion of poor performance at least once.
The development of new generations of CHD databases incorporating unified nomenclature and an agreed-upon data set will undoubtedly facilitate the introduction of more sophisticated methods, which could take into account case mix and various risk factors. We are aware that assessment of mortality is a rather blunt analysis tool and that other outcome measures such as duration of postoperative ventilation, length of hospital stay, morbidity, and so on may be preferred, particularly for operations with very low risk of mortality; De Leval and coworkers [5] demonstrated that investigation of "near misses" rather than deaths may help to detect early unfavorable trends in performance.
Statistical methods alone cannot be used as a sole arbiter of what is considered acceptable performance. Statistical analysis is only one element of a broad range of information contributing to the improvement of care. As surgeons do not work in isolation but are part of a team, the care provided by the whole team should be evaluated. The new approach based on examination of patients records was recently published in UK [6]. The panel of specialists (surgeon, cardiologist, nurse, anesthetist, pathologist) reviewed the care of children operated on for CHD from admission to discharge or death to ascertain that the care provided for individual children was adequate.
To compare our data with the results published in the literature would be difficult. Most reports about mortality rates for CHD are not concerned with the performance of individual departments or surgeons [79]. Overall results of operations for CHD from well-defined populations were published from the state of New York [8] and from Sweden [10]. The overall mortality rate was 6.7% in New York between 1992 and 1995 [8]; in Sweden it was 1.9% for 1995 through 1997 [10].
Results from five departments collected over the 2-year period provided a good basis for estimating overall mortality rates for open and closed operations. The size of the data was less than optimal for assessment of differences among departments and surgeons. Such assessment would require aggregation of larger numbers. New statistical and nonstatistical methods for evaluation of surgeons performance are needed.
| Acknowledgments |
|---|
|
|
|---|
| Discussion |
|---|
|
|
|---|
The mortality rates for their diagnosis-driven study involving coarctation, VSD, A-V canal, tetralogy of Fallot, transposition, and truncus arteriosus are all similar to the standard published reports from the more well-known centers around the world. In addition, these results from the UK database are strikingly similar to the 4-year North American STS Summit Database Harvest that was published in 1998.
The authors correctly point out that increased numbers of cases will allow more sophisticated statistical data analysis, which could result in further systems analysis, which, of course, can lead to better clinical results. The cooperative effort that has resulted from the International Nomenclature and Database project has already established unified diagnosis and procedure lists, which will allow meaningful analyses amongst large populations in North America, Europe, Asia, and Australia. Efforts are being made to include South America and, in a not too distant future, Africa.
Doctor Starks beautiful study shows congenital heart centers can cooperate and share data. What is done with these data is a challenge for the future. The virtues of such studies lie in the willingness of individual centers not only to share data but to share the systems by which they achieve their excellent results. The model from the New England Coronary Artery Group showed that mortality and morbidity significantly decreased when the participating centers visited one another. The visiting team included nursing staff, perfusionists, anesthesiologists, and surgeons. Small changes in systems and techniques were responsible for the material results.
I have a few questions for Dr Stark. Is your group planning any such intervisit groups for the purpose of comparing systems and procedures? These data, no doubt, will be seen in the government, the press, and various spin groups. Do you plan to have a Web site to answer questions as they arise? And how did you fund this endeavor? Can you suggest any developmental models for funding these kinds of studies in the future?
Congratulations, Dr Stark, on a well-controlled and important study that I am sure will stimulate other studies as time goes by.
DR STEPHEN J. LAHEY (Worcester, MA): I think this is a very, very important paper for a variety of reasons. I am not a congenital heart surgeon, but this particular issue of data reporting and interpretation is important to all of us, especially in adult cardiac surgery. I am also on the executive committee of the Northern New England Cardiovascular Disease Study Group and have been for many years.
Eventually, all surgeons in all of our states are going to have to confront this enormous issue of data reporting and subsequent use by state and federal agencies for the purpose of establishing report cards for all of us. This study accurately points out the inherent limitations of taking one moment in time to look at a particular outcome for the purpose of determining whether or not a surgeon is competent to continue performing a procedure.
I would, however, disagree with the idea that the methods traditionally used to assess statistical significance should no longer be used because they may be misinterpreted. The statistical analysis is absolutely correct, but it is our obligation, as we are now trying to do in Massachusetts, to educate government leaders on how to interpret these data. We must not abandon traditional statistical methods because the press, who will inevitably obtain all of these data through the Freedom of Information Act, might misuse the results, but rather work hard to educate people on the proper way to interpret statistical significance.
DR STARK: I would like to thank Dr Mavroudis for his kind words. Regarding his questions: we were of course aware of the New England coronary study. The funding was not available for our study; all expenses were covered by the participating departments therefore we did not visit each others departments. However, before the study was started, one of us had visited all departments to ensure that the departmental databases were suitable for the proposed study.
Regarding making the data available to the public: in the United Kingdom it is now a requirement that individual surgeons data are available to patients or parents. This requirement has been stipulated by the General Medical Council and the Department of Health. Because of the lack of funds we do not have a dedicated Web site but we are planning to make results from our study available as widely as possible. This study was started because of the absence of reliable national data in the United Kingdom. We wanted to show that the data can be collected. However, as we were unable to obtain data from all 12 departments operating on children with congenital heart defects, we have studied the data from five departments only. All surgeons participating in the study have agreed to make their individual results available to the public.
I believe I have already answered the last question. We cannot suggest any developmental models for funding similar projects as we were not funded ourselves.
I would like to make one additional comment. The International Nomenclature and Database project under the leadership of Dr Mavroudis proposed a congenital heart data set and nomenclature of diagnoses and operations. This proposal was published in The Annals of Thoracic Surgery and accepted by The Society of Thoracic Surgeons (STS) and the European Association for Cardiothoracic Surgery (EACTS). Unfortunately the situation in Europe remains controversial. In Germany the decision has been made that all departments of pediatric cardiac surgery would use the more extensive data set and the larger list of diagnoses and operations that were developed by the Association of European Paediatric Cardiologists. This data set has more than 350 fields compared with 28 fields in the STS/EACTS proposal. In the United Kingdom it was agreed last year to use the government-supported Central Cardiac Audit Database (CCAD), which used yet another data set and another nomenclature. Therefore we have a long way to go before we are able to pool and compare data and start cooperative studies in Europe.
With regards to Dr Laheys question: there are major differences in analyzing the results of coronary artery surgery and congenital heart operations. In the former, several studies evaluated thousands of coronary patients, while in congenital heart surgery we are presenting single or double figures for some operations performed during the 2-year period.
I did not advocate abandoning the existing statistical methods; that would be a mistake. I tried to explain that with such small numbers the existing statistical methods are not always suited for assessment of congenital heart surgeons performance. We have suggested that the new statistical as well as nonstatistical methods would be needed.
The final comment: When evaluating the results, we, as well as the public, should not center all our attention on the surgeons performance. The outcome of every operation clearly depends on the performance of the whole team; the surgeon is just one of the teams members. The Bristol Royal Infirmary Inquiry recently used and published an innovative approach to this difficult problem. They selected hospital charts of 80 patients; these charts were evaluated by several groups of experts. Each expert group consisted of a pediatric cardiac surgeon, a pediatric cardiologist, an anesthetist/intensivist, a nurse, and a pathologist. The purpose of this evaluation was to assess the standard of care from admission to discharge or death. They did not investigate negligence. It was interesting that in cases in which the standard of care was considered less than adequate, surgeons could be held responsible only in a minority of cases. Wrong diagnoses, problems during anesthesia, postoperative care, and problems in overall organization all contributed to suboptimal care. This approach is an example of how new techniques in assessment can be helpful.
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
S. Gallivan, J. Stark, C. Pagel, G. Williams, and W. G. Williams Dead reckoning: can we trust estimates of mortality rates in clinical databases? Eur. J. Cardiothorac. Surg., March 1, 2008; 33(3): 334 - 340. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Ma, K. Gauvreau, C. K. Allan, J. E. Mayer Jr, and K. J. Jenkins Causes of Death After Congenital Heart Surgery Ann. Thorac. Surg., April 1, 2007; 83(4): 1438 - 1445. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. A. Larrazabal, P. J. del Nido, K. J. Jenkins, K. Gauvreau, R. Lacro, S. D. Colan, F. Pigula, O. J. Benavidez, F. Fynn-Thompson, J. E. Mayer Jr, et al. Measurement of Technical Performance in Congenital Heart Surgery: A Pilot Study Ann. Thorac. Surg., January 1, 2007; 83(1): 179 - 184. [Abstract] [Full Text] [PDF] |
||||
![]() |
D J Spiegelhalter Handling over-dispersion of performance indicators Qual. Saf. Health Care, October 1, 2005; 14(5): 347 - 351. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Kang, T. Cole, V. Tsang, M. Elliott, and M. de Leval Risk stratification in paediatric open-heart surgery Eur. J. Cardiothorac. Surg., July 1, 2004; 26(1): 3 - 11. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Boethig, K.J. Jenkins, H. Hecker, W.-R. Thies, and T. Breymann The RACHS-1 risk categories reflect mortality and length of hospital stay in a large German pediatric cardiac surgery population Eur. J. Cardiothorac. Surg., July 1, 2004; 26(1): 12 - 17. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Gallivan, K.B. Davis, and J.F. Stark Early identification of divergent performance in congenital cardiac surgery Eur. J. Cardiothorac. Surg., December 1, 2001; 20(6): 1214 - 1219. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |