|
|
||||||||
Ann Thorac Surg 1999;67:635-640
© 1999 The Society of Thoracic Surgeons
a Catalan Agency for Health Technology Assessment, Catalan Health Service, Department of Health, Generalitat de Catalunya, Barcelona, Spain
b Catalan Institute of Oncology, Catalan Health Service, Department of Health, Generalitat de Catalunya, Barcelona, Spain
Accepted for publication July 29, 1998.
Address reprint requests to Dr Pons, Catalan Agency for Health Technology Assessment, Travessera de les Corts 131159, Pavelló Ave Maria, 08028 Barcelona, Spain
Presented at the Fourteenth Annual Meeting of the International Society of Technology Assessment in Health Care, Ottawa, Ont, Canada, June 710, 1998.
| Abstract |
|---|
|
|
|---|
Methods. Predictive discrimination of both risk assessments (surgeons and model) was compared through the area under the receiver operating characteristic curve. Logistic regression analysis was used to assess the relation between surgeons and model predictions to actual outcomes. Calibration of the subjective estimates was evaluated with a
2 test.
Results. Overall, the area under the receiver operating characteristic curve was 0.76 for the statistical model and 0.70 for the subjective assessment. Logistic regression analysis showed that the statistical model remained significant after accounting for the subjective assessment. Calibration of subjective mortality predictions was poor.
Conclusions. Surgeons risk assessment tends to cluster in the middle ranges of risk. Subjective assessment seems accurate in identifying the two extremes of risk but is inaccurate for intermediate risk levels. A multivariate statistical model improves the accuracy of subjective predictions.
| Introduction |
|---|
|
|
|---|
Therapeutic decision making requires an accurate assessment of the prognosis of disease. Prognosis is generally affected by a large number of factors, including the patients age, gender, severity of symptoms, extent of disease, and coexisting medical disorders. Consequently, techniques are needed to identify prognostic factors, quantify their strength or relative importance, and construct a risk stratification model.
Multivariate analysis of observational data can be used to adjust for the known imbalances in clinical characteristics of prognostic importance between patient groups [3]. This tool has been widely used in intensive care units [4] and for some forms of surgical therapy such as open heart intervention and, specifically, coronary artery bypass grafting [5]. Another important use of a predictive model or clinical index has been to compare health care outcomes among different providers [6, 7] or between medical interventions outside the randomized controlled trial.
Cardiac surgical therapy entails a short-term risk of operative mortality, yet offers long-term benefit in terms of survival or quality of life. When considering surgical intervention as a treatment option, surgeons should transmit accurate knowledge to patients about the extent to which they are at risk for surgical mortality, so that patients (and relatives) can make informed decisions. Although the process of therapeutic recommendations may seem highly rational, surgeons do not usually make decisions in this way. Beyond pathophysiologic reasoning, published research, and experts opinion, surgeons apply their personal experience and information to assess risk and to arrive at therapeutic recommendations [3].
The aim of the present study was to evaluate the predictive accuracy of surgeons subjective risk assessment of surgical mortality in patients undergoing open heart procedures and to compare this estimate with a statistical predictive risk model specifically designed for this population.
| Material and methods |
|---|
|
|
|---|
-month period (February 15 to August 31, 1994) and were incorporated into a two-part data sheet. The first part included patient sociodemographic and clinical variables (medical antecedents, morphologic and functional studies, special circumstances or states, priority of intervention) and was completed by the surgeon before the intervention. Procedural (reoperation, type of open heart procedure) and follow-up variables were collected by a research assistant for inclusion in the second part of the data sheet. Just before the intervention, sheets were placed in a locked mailbox located in surgical areas to prevent any modification of the data. According to the study protocol agreed on by the Catalan Cardiac Surgeons Society and, despite identification of the surgeon carrying out the procedure, surgeon identity was discarded at the time of introducing and processing the data. Therefore, data presented are at the hospital level, identified by a number.
Surgical mortality, the main outcome analyzed, was defined as any death occurring within the 30 days after the intervention or during the hospital stay, irrespective of length of stay.
Subjective risk assessment of surgical mortality was evaluated through the following question, included in the first part of the sheet, to the attending cardiac surgeon: From your point of view, what is your prognosis of the surgical mortality risk of the patient? Answers to this variable were categorized into five levels of increasing risk: low, fair, high, very high, and extremely high.
A more objective and comparable measure of surgical mortality risk was provided by a statistical predictive risk model based on the presence and severity of selected clinical and procedural variables. This model, described elsewhere [8], was based on a logistic multivariate regression of clinical and procedural variables. The model, developed in a reference group (70% of the sample), was cross-validated in the rest of the sample. Patients were allocated to any of the subsamples at random. Depending on the presence of some risk factors (Table 1), a patients score was generated that was stratified into categories of increasing surgical mortality risk: low, fair, high, very high, and extremely high. There was a good agreement (p = 0.34) between predicted and observed surgical mortality risk for each risk level in the validation group [8].
|
Calibration of the subjective risk estimates, that is the extent to which the predicted probability matches what is observed, was evaluated through the same cross-validation procedure used to develop and test the statistically predictive model. Predicted and observed mortality rates in the validation subsample were compared with a
2 test.
A statistical comparison of the relation of surgeons and model predictions with actual outcomes was also performed using logistic regression. The dependent variable was binary, depending on whether the main outcome analyzed (surgical mortality) occurred. The independent variables were the predictions, either the surgeons subjective predictions or the model prediction, or both, coded as dummy variables with the first risk category as reference. In this way, we assessed which predictions (surgeons or model) had the strongest association to the observed or actual outcome and, additionally, whether the model predictions contributed any prognostic information to the surgeons predictions and whether the surgeons predictions contributed any prognostic information to the model prediction.
| Results |
|---|
|
|
|---|
A detailed description of the study population has been given elsewhere [8]. The most important characteristics, however, are displayed in Table 2. Overall, the surgical mortality rate was 10.5%. For the different types of open heart intervention, the mortality rate was 8.2% for isolated coronary artery bypass grafting (from 4.3% for elective to 14.3% for 14 emergent cases); 10.1% for isolated valve operation (from 8.2% for elective to 57.1% for 7 emergent cases); and 23.8% for combined valve and coronary operation (from 26% for elective to 100% in 3 emergent cases).
|
|
|
Logistic regression analysis, as displayed in Table 4, showed that the subjective risk assessment was statistically significant (p < 0.001) on its own, with crude odds ratios ranging from 1.5 to 23 for each risk level with respect to the low-risk category. However, when the statistical predictive model was included in the logistic regression, the subjective risk assessment lost its significance (p = 0.21). Therefore, corresponding adjusted odds ratios decreased to a nonstatistically significant level. When adjusted by the subjective risk assessment, the statistical model remained significant (p < 0.001), although the odds ratios decreased slightly because of the obvious colinearity between both risk models. No statistically significant interaction was found among subjective predictions and centers.
|
2 test between predicted and observed rates showed a poor calibration, with a probability value near the limits of statistical significance (p = 0.07).
|
|
| Comment |
|---|
|
|
|---|
These comparisons are important because if the statistical model predicts outcomes more accurately than surgeons, its usefulness could then be defined in terms of the ability to provide information, both to patients (and relatives) and doctors, and help in the decision-making process [11]. The statistical predictive model has been also used to assess performance of open heart interventions in the hospitals with the greatest concentration of patients in Catalonia, Spain [8]. Differences between published "ideal" results and results in a specific setting may alter therapeutic recommendations for individual patients and highlight opportunities for improving the delivery of care [3].
Our results suggest that the performance of surgeons predictions is good, especially for surgeons working at some centers that showed an area under the ROC curve higher than 0.8. Some of these centers were also those with a lower proportion of surgical deaths and differ from the other centers in their characteristics. These differences among centers in predictive discrimination capability could be due to case-mix heterogeneity, but the most important risk factors seem to be captured by the statistical model. The existence of a more homogenous criterion among surgeons in some centers for assessing surgical risk can not be excluded. The possibility of a smaller number of surgeons and unequal experience or training among them could also contribute to the differences observed. However, our data show that a multivariate statistical model developed from extensive clinical experience can improve prognostic predictions of surgical mortality over those made by surgeons in most of the centers.
Several previous studies have compared diagnostic or short-term predictions made from computerized databases with those of expert clinicians in cardiovascular diseases [12]. Results differ widely because some studies found that physician predictions were comparable [13, 14], occasionally better [1], and frequently worse [15, 16] than such databases. The field of prediction technology has also experienced considerable growth and improvement [17]. Other studies have suggested that clinicians predictive ability may be enhanced through exposure to information, particularly in the case of younger or less experienced physicians [18]. The information provided by a predictive model may also be useful for allocating resources to more critical patients or assigning certain patients to clinicians more experienced in their special needs.
Ideally, experience and continuing practice should produce more accurate perception of risk and benefit. However, results are divergent in those studies that try to assess whether experience improves perceptions and, consequently, predictions [19]. However, the design of our study did not permit analysis of this factor.
Open heart surgical intervention remains one of the most complex and risky procedures in clinical medicine. Surgeons apply their personal experience and information from published reports to arrive at surgical recommendations. Surgeons recall of personal clinical experience is notoriously selective and is too heavily influenced by recent cases or particularly bad or good outcomes [3]. The phenomenon of subjective probabilities has been explained, in part, by the application of certain heuristic rules of reasoning that lead to systematic errors or biases, as in judging an event more probable if instances of its occurrence are more easily recalled [20].
There were substantial differences in patient distribution by risk level between the subjective assessment and the statistical model. Whereas the subjective assessment considered most patients to be at the fair or high risk level, the statistical model placed most patients at the low risk level. This difference could be attributed to a preventive attitude on the part of some surgeons who preferred to assign patients, even those considered at low surgical risk, to an intermediate risk category to allow for an unexpected complication arising during or soon after operation. In other words, the hope of eliminating risk (perfect safety) may be illusory, given the presence of some other known or unknown hazards [2].
Subjective assessment seems to be accurate in identifying the two extremes of risk (low versus very high), as displayed in Figure 2, with better discrimination than calibration. These results are consistent with those found by others [21] and could be explained, in part, because subjective assessment includes a number of factors that are difficult to measure objectively but are indeed related to mortality.
There are several limitations to our study that should be considered when assessing our results. We used qualitative, without the help of any numerical guidance, versus quantitative estimates of subjective surgical risk. It has been said that quantitative assessment of prognosis may be more meaningful and less variable among physicians than qualitative descriptions, such as the ordinal scale used in our study [16, 22]. The analysis of subjective risk assessment is also a limitation of our study because it was done by center only and not by surgeon. Therefore, this risk assessment represents a pool of different surgeons subjective, individual evaluations that might be influenced by experience, continuous practice, and different interpretations of the risk categories. Thus, our study provides an overall quantitative estimate for this subjective and qualitative assessment.
Other limitations are common to any risk predictive model: They cannot consider all clinical factors that, at least for some patients, might be strong determinants of risk; they do not consider other nonclinical factors of patients (lifestyle, culture, and socioeconomic factors) and providers (organizational and managerial practices, technical and therapeutic resources); and they do not always take into account the appropriateness of the procedure. Finally, the cross-validation procedure used must be kept in mind when interpreting these results. The subjective nature of surgeons predictions makes them less suitable and reliable than an explicit rule-based tool when comparing risk and mortality among centers. The fact that the
2 test for calibration was near the limits of statistical significance precludes considering it a good fit for subjective assessment.
One advantage of our study is that we evaluated the surgeons predictive accuracy in real cases, not written scenarios that may not reflect real-life practice. Surgeons made their predictions at the actual patient encounter and thus had personal interaction with the patient. Thus, we estimated the accuracy of surgeons in practice. Surgeons predictions are also based on individual patient cases, whereas model predictions are based on groups of homogeneous patients or average cases. For instance, for a particular patient with an extremely high surgical mortality risk, there is no model that can predict whether they will or will not survive the procedure.
In conclusion, subjective estimates tend to cluster in the middle ranges of risk, suggesting a preventive attitude so as to avoid unexpected hazards. Surgeons predictions show better discrimination than calibration, allowing differentiation of the two extremes of risk, but seem inaccurate for the intermediate risk levels. Overall, a multivariate statistical model specifically developed for this population improves the accuracy of subjective predictions. This tool has a role in the clinical decision-making process and may be useful for those with a stake in health care system policies.
| Acknowledgments |
|---|
|
|
|---|
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
P. Gersbach, H. Tevaearai, J.-P. Revelly, P. Bize, R. Chiolero, and L. K. von Segesser Are there accurate predictors of long-term vital and functional outcomes in cardiac surgical patients requiring prolonged intensive care? Eur. J. Cardiothorac. Surg., April 1, 2006; 29(4): 466 - 472. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Berman, A. Stamler, G. Sahar, G. P. Georghiou, E. Sharoni, R. Brauner, B. Medalion, B. A. Vidne, and A. Kogan Validation of the 2000 Bernstein-Parsonnet Score Versus the EuroSCORE as a Prognostic Tool in Cardiac Surgery Ann. Thorac. Surg., February 1, 2006; 81(2): 537 - 540. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-Y. Dupuis Clinical Predictions and Decisions to Perform Cardiac Surgery on High-Risk Patients Seminars in Cardiothoracic and Vascular Anesthesia, June 1, 2005; 9(2): 179 - 186. [Abstract] [PDF] |
||||
![]() |
D. P.B. Janssen, L. Noyez, C. Wouters, and R. M.H.J. Brouwer Preoperative prediction of prolonged stay in the intensive care unit for coronary bypass surgery Eur. J. Cardiothorac. Surg., February 1, 2004; 25(2): 203 - 207. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |