|
|
||||||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Ann Thorac Surg 2002;74:641-649
© 2002 The Society of Thoracic Surgeons
a Department of Cardiovascular Surgery, VA Boston Healthcare System, West Roxbury, Massachusetts, USA
* Address reprint requests to Dr Khuri, Department of Cardiovascular Surgery, VA Boston Healthcare System, 1400 Veterans of Foreign Wars Parkway, West Roxbury, MA 02132 USA
e-mail: shukri.khuri{at}med.va.gov
Presented at the Thirty-eighth Annual Meeting of The Society of Thoracic Surgeons, Fort Lauderdale, FL, Jan 2830, 2002.
Thank you, President Orringer, for your gracious introduction and for giving me the honor of delivering the Thomas B. Ferguson Lecture for 2002. Tom Ferguson is a giant in our field, exemplifying all that a surgeon and a human being should aspire to be. It is most fitting that we celebrate his legacy today with a discourse on issues that should matter very deeply to us as surgeons: the assessment of the quality of our care, the tools that enable us to advocate successfully for the well-being of our profession, and the dire need for us to shape the healthcare policies that affect us. Tom Ferguson would agree that there is a common thread weaving through all of these imperatives, the same thread that I hope to weave through my talk to you this morning. At the outset, I would also like to acknowledge my numerous colleagues in the Department of Veterans Affairs (VA) National Surgical Quality Improvement Program (NSQIP), including its executive committee, the more than 120 chiefs of surgical services, and the equally numerous clinical nurse reviewers. In particular I would like to acknowledge my co-chair of the NSQIP, Jennifer Daley, MD, and our lead biostatistician William Henderson, PhD.
These are hard times for surgeons. As we strive to improve the care of our patients and advance the boundaries of our respective fields, forces external to the surgical profession are setting for us standards for the care of our own patients, dictating to us the minutiae of our day-to-day management of these patients, deciding for us what is acceptable and what is unacceptable quality of care, and determining for us equitable compensation schemes. Take for example the latest of these external infringements: the standards that have been recently set by the Leapfrog Group. The Leapfrog Group is a conglomeration of purchasers of health care that includes more than 72 Fortune 500 companies. As a group, they have in excess of 24 million employees who represent $45 billion in healthcare expenditure. This group of employers has set standards for urban hospitals to meet before it would contract them for the care of their millions of employees nationwide. These standards include (1) 24 hours per day, 7 days per week coverage in intensive care units by intensivists; (2) implementation of electronic physician order entry; and (3) what was referred to as evidence-based hospital referrals based on minimum volume requirements in five major operations, including coronary artery bypass grafting, esophagectomy, and abdominal aortic aneurysmectomy. The latter standard implied that high volume of surgery was commensurate with better outcomes. I will have more to say about these standards later in this talk. However, irrespective of whether you agree with them or not, the fact that they were set not by our own professional societies but by a conglomerate of healthcare purchasers should be alarming to each one of us, inasmuch as we should be alarmed by standards and policies that are being set for us by various states, by the federal government, and even by the Joint Commission for the Accreditation of Healthcare Organizations.
The road to the national surgical quality improvement program
External standard-setting and infringement on our surgical specialties is deja vu for VA surgeons. In the mid-1980s, the quality of VA surgery came under a barrage of criticism from the media, which claimed that postoperative outcomes of surgery in VA hospitals were worse than those in the private sector. Those of us in the VA then were convinced that the discrepancy in unadjusted outcomes between the VA and the private sector did not reflect a difference in quality of care, but that it was primarily related to the fact that patients referred to the VA, as a group, were sicker and more complex than patients referred to the private sector. However, there were no reliable data to support this conviction. The criticism against VA surgery prompted the US Congress, late in 1986, to issue Public Law 99-166, which mandated that the VA report its surgical outcomes in comparison to the national average, and that these outcomes be risk-adjusted to account for differences in severity of illness between VA and non-VA populations. The response to this congressional mandate, and what evolved from it over the years, will be the subject of my talk because it exemplifies a paradigm in which surgeons took it upon themselves to establish a reliable infrastructure for the comparative assessment of the quality of care of their patients, and for addressing advocacy and healthcare policy issues that mattered to them and their profession.
The VA is the largest single provider healthcare delivery system in the United States. The Veterans Health Administration (VHA) comprises 128 medical centers that perform major surgery, of which 42 perform open heart surgery. Cardiac surgery in the VA provided the impetus and the road map for the response to the congressional mandate. Two years before the congressional mandate was issued, two visionary members of the VA Cardiac Surgery Consultants Committee, Fred Grover, MD, and Karl Hammermeister, MD, started a prospective data collection in all VA cardiac surgical centers and developed a novel system for risk-adjustment of outcomes in cardiac surgery. However, this effort was hampered by inadequate funding and resulted in only 60% to 70% completion of data collection. Thus, when in 1988 a group of us were consulted by the VA to advise it on how best to respond to the congressional mandate, we pointed out that at that time, there were no known acceptable national averages for outcomes of various surgical specialties, and that, except for a limited experience in cardiac surgery, there were no known credible models for risk adjustment of surgical outcomes. We argued, however, that the VA was in a unique position to lead the nation in developing national norms and risk-adjustment models because of its centralized administrative structure, its uniform information technology infrastructure, and its experience with risk adjustment in cardiac surgery. Our committee succeeded in convincing the VA to initiate, in 1991, the National VA Surgical Risk Study with the goal of developing and validating risk-adjustment models for the prediction of surgical outcome, and for the comparative assessment of the quality of major surgical care among the VA surgical centers.
Donabedian [1], in his classic treatise on quality of care, defined three dimensions of health care that can be used in the assessment of quality: structure, which describes the attributes of how healthcare systems are organized; process, which describes what we do to and for our patients; and outcome, which describes the changes in patients health status that may be attributed to the healthcare process. Surgery has the advantage of being ideally suited for the use of outcome measures in the assessment of quality, because surgical care revolves primarily around a predictable single event (the operation), which, in most cases, has an expected and a measurable outcome. Hence, the rationale underlying the National VA Surgical Risk Study was based on what Iezzoni [2], a leading expert on risk adjustment of outcomes, terms the "algebra of effectiveness"a conceptual framework in which outcomes of health care are determined by the sum of three major factors: patient risk factors before surgery, the effectiveness (or quality) of the patients care, and random variation. If one accounts for the severity of the patients illness by proper risk adjustment, and for random events by proper statistical methods, one can then equate outcome to effectiveness of care. Hence, to enable the use of outcome as a measure of quality of surgical care, the National VA Surgical Risk Study had to: (1) develop a reliable clinical database of the patients relevant preoperative risk factors and postoperative outcomes, and (2) develop analytic tools for proper risk adjustment and to account for random events (Fig 1).
|
The outcome variables included death and complications within the first 30 days postoperatively. The complications were prospectively categorized into 22 groups. After developing and comparing 11 morbidity scoring schemes, the final statistical models were developed with a dichotomous morbidity score based on whether or not a patient had one or more complications. To account for variation in the complexity of the operations among institutions and subspecialties, panels of 6 expert surgeons in each specialty developed a complexity score for each of more than 3,000 CPT codes contained in the database. Interrater reliability was assessed by two traveling coordinators who site visited each medical center and re-abstracted a sample of cases from each site. The resultant kappa statistics indicated good interrater reliability for all types of variables collected. Complete data were collected on 103,342 operations between October 1, 1991, and December 31, 1993. The cardiac operations were analyzed and reported separately. For non-cardiac surgery, nine predictive models of 30-day mortality were constructed [3], one for all operations and one for each of eight major surgical subspecialties. High C-indices, which ranged from 0.79 to 0.91, indicated excellent predictability for all these models. (A C-index of 1.0 indicates perfect predictability and C-index of 0.5 indicates no predictability.) Similar models were constructed for 30-day morbidity [4]. Preoperative serum albumin was by far the most important predictor of 30-day mortality and morbidity in the all-operations model. By generating a beta coefficient for each of the predictive variables in these models, the logit equation was used to determine an expected mortality or morbidity rate for any given population of patients. Knowing the actual or observed mortality and morbidity rate for that patient population, one could generate an O/E (observed to expected) ratio for each of these outcomes. Figure 2 shows the overall mortality OE ratio for each of the 44 participating medical centers during the 27-month period of the study. The hospitals are arranged in the order of increasing OE ratio, which ranged from 0.49 to 1.53. Asterisks indicate the statistically significant outlier hospitals at the 90% confidence level. The high-outlier hospitals on the right side had an observed mortality rate that was significantly higher than that accounted for by the severity of illness of their respective patient populations. The low-outlier hospitals on the left side had an observed mortality rate that was significantly lower than that accounted for by the severity of illness of their respective patient populations. The implications in this figure were that high-outlier hospitals provided inferior quality of care and low-outlier hospitals provided superior quality of care. One of the major accomplishments of the National VA Surgical Risk Study was that it validated its findings by conducting a study led by Jennifer Daley, MD, which included site visits by teams of surgeons, health services researchers, and nurses, and demonstrated that indeed hospitals with significantly low mortality and morbidity O/E ratios had superior structures and processes of care, whereas hospitals with significantly high mortality and morbidity O/E ratios had inferior structures and processes of care.
|
|
|
|
Thomas Garthwaite, MD, Under Secretary for Health in the Department of Veterans Affairs and the highest ranking person in the Veterans Health Administration, has repeatedly stated, "In the past, we needed to attend to complaints about surgery more than any other discipline. Since the NSQIP was established, we have not had to spend time on such complaints, because we now have reliable data with which we can properly address these issues."
The NSQIP has established a peer-review mechanism allowing VA researchers to interrogate and analyze its rich database, and to address important questions related to medical care, healthcare quality, advocacy, and healthcare policy. To date, the program has contributed to the literature 33 peer-reviewed journal articles and five book chapters. It has made 55 presentations at national meetings and is currently conducting 62 research studies. The contributions to the literature and to VA health policy have been wide-ranging. For example, considering that health policy research in the VA was heavily dependent on data obtained from its administrative database, and considering that risk adjustment is critical to all research that uses surgical outcomes as end points, it was important for the NSQIP to determine whether or not the information contained in the VA administrative database was adequate for proper risk adjustment of outcomes after surgery. To this effect, Best and associates [7] compared patient preoperative variables and 30-day mortality and morbidity in the NSQIP database, which were designated as the standard criterion, to the corresponding variables in the VAs Patient Treatment File (PTF), which were designated as the test criterion. Both the sensitivity and the positive predictive value of the PTF in depicting risk factors and postoperative outcomes were calculated. To justify replacing the NSQIP with the PTF, each of these measurements should have a value equal or exceeding 0.9. Despite the relative clinical robustness of the VA PTF, these values were nowhere near 0.9! The average sensitivity and positive predictive value of the PTF in depicting the preoperative risk factors were 0.28 and 0.41, respectively. Worse still, the average sensitivity and positive predictive value of the PTF in depicting the postoperative outcomes were 0.17 and 0.18, respectively. This study was instrumental in silencing critics of the NSQIP within the VA who had claimed that data collection in the NSQIP was too expensive and superfluous, and that the same data could be obtained from the PTF.
Another major NSQIP study addressed a debate that raged in 1997 regarding the regionalization of surgical referrals to high-volume centers. The VA by then had reorganized into 22 new autonomous networks, and a new cost allocation system had been implemented that favored primary over tertiary care. This prompted a number of network directors to recommend closing small-volume surgical services, arguing that better quality of surgical care would prevail in larger volume hospitals. To address this, the NSQIP examined the relation of surgical volume to outcome in eight common operations: abdominal aortic aneurysm repair, infrainguinal vascular reconstruction, carotid endarterectomy, lung resection, open and laparoscopic cholecystectomy, colectomy, and total hip arthroplasty [8]. Four types of statistical analyses showed no relationship between the 30-day mortality O/E ratio and procedure volume in any of the eight operations examined. Automatic interaction detection analysis also failed to identify a volume threshold below which risk-adjusted 30-day mortality was adversely affected in any of the eight operations. In this study, for example, the hospital with the highest volume of colectomies, 52 cases per year, was one of the three high-outlier hospitals, whereas a hospital with fewer than one third of these cases was the lowest outlier hospital, i.e., the best performer in the whole group! In the face of these compelling data, managers in the VA could no longer invoke quality improvement as a justification for the closure of small-volume surgical centers. More importantly, these managers have accepted NSQIP risk-adjusted outcomes as the measures for quality of surgical care and as the basis for major decision making regarding surgery.
The debate about whether increased volume improves the outcomes of surgery or not is a raging debate that is unlikely to be settled soon. Almost all major studies that have found a direct relationship between volume and outcomes in non-cardiac surgery have been based on administrative and claims databases. The NSQIP studies, which have repeatedly failed to show such a relationship, raise serious questions about the validity of the risk adjustment in studies that use administrative and claims databases. More importantly, the NSQIP studies have repeatedly underscored the fact that quality is in systems of care, and that referral centers with high volumes of surgery may exhibit good quality of care not because of high volumes per se but because these large referral centers generally have good systems of care. By providing a direct outcome-based measure of quality of surgical care, the NSQIP has eliminated the need in the VA to use volume of surgery as a proxy measure of quality, as the Leapfrog Group did in setting its standards for the private sector.
A vision for the future of the national surgical quality improvement program
Within its overall strategic plan, the NSQIP is still in its infancy and will require several years to mature into a fully comprehensive system for the comparative assessment and improvement of the quality of surgical care. It uses only one of the three quality-related dimensions of health care: outcome, and within that, only 30-day morbidity and mortality (Fig 6A). The strategic plan of the NSQIP (Fig 6B) calls for the incorporation of additional instruments and tools for the measurement of postoperative functional status, quality of life, and patient satisfaction. The ability of the NSQIP to measure reliably risk-adjusted outcomes gives it a unique opportunity to identify elements within the other two dimensions of health care, process and structure, that can be used as meaningful measures of quality of surgical care. An article in the Wall Street Journal, published in December 2001, underscored the failure of the Joint Commission for the Accreditation of Healthcare Organizations accreditation process in depicting quality of care, because the commission relied almost exclusively on arbitrary processes and structures that had not been shown to relate meaningfully to outcomes. Only processes and structures that have been demonstrated to impact on surgical outcome should be used as measures of quality of care. Cost also cannot be excluded from any quality measurement system. Here again, by relating cost to risk-adjusted outcomes, the NSQIP should be able to provide, maybe for the first time, meaningful measures of cost-effectiveness, thus completing, hopefully in the near future, the big picture of comprehensive quality improvement shown in Figure 6B.
|
Should and can there be an NSQIP outside the VA? The surgical community today has as many reasons to set up an NSQIP-like program as did the VA surgeons 15 years ago. Outcomes without risk adjustment continue to be used (and abused) by the lay press as ipso facto measures of quality of surgical care. The NSQIP has shown that the use of unadjusted outcomes can lead to an error in judging the quality of a specific hospital in 60% of the cases. Consumer groups are now invoking the Freedom of Information Act and publishing on the Internet grades of hospitals and providers, based on partial administrative and claims databases that preclude adequate risk adjustment. After visiting www.healthgrades.com, who would like to be treated at a less than 5-star hospital?! The US News and World Report provides an annual listing of the "best" hospitals that has the trust of only those of us whose hospitals make it to the list! Those who do not make it should be reassured that risk-adjusted outcomes are not among the performance criteria used in this report. The alarming trend among various states to rate individual surgeons on the basis of their outcomes, which started in cardiac surgery, is now rapidly expanding to other fields, and all types of report cards are being proposed to grade surgeons, mostly with little or no input from the surgical community. Surgical specialty boards are having a hard time with the mandate issued by the Accreditation Council for Graduate Medical Education and the American Board of Medical Specialties that each specialty board should develop measures of provider competence to be incorporated into the certification and recertification processes. In the VA, we have discouraged the generation of surgeon-specific risk-adjusted outcomes mainly because of two serious pitfalls: first, the average surgeon does not perform enough operations annually to provide a statistically meaningful sample size for the generation of stable O/E ratios. Second, and probably more importantly, one cannot separate the performance of a provider from that of his or her institution, because quality is highly dependent on institutional systems. The most competent surgeon will have poor outcomes in inferior systems of care. For this and for many other reasons, outcome-based individual report cards have very little value in quality improvement. They will harm NSQIP-like efforts because they alienate and disfranchise the surgeons in the field. If the Accreditation Council for Graduate Medical Education mandate should result in the development of outcome-based report cards for surgeons, it is imperative that the quality of both the surgeon and his or her institution be measured interdependentlyanother reason for setting up an NSQIP-like program nationally.
Most alarmingly, in the absence of an authoritative professional surgical organization that sets national standards, industry and managed care are setting our standards by defaultstandards that may be harmful to the surgical community. A recent study by Birkmeyer and associates attempted to justify the Leapfrog Groups volume standards by calculating, from published studies, the numbers of lives saved had patients in these studies been referred to high-volume hospitals only. Seventy-five percent of the lives saved were calculated on the basis of two studies that had shown a direct relationship between volume and outcome, one in abdominal aortic aneurysmectomy in the VA, and one in coronary artery bypass grafting in the state of New York. The VA study used by Birkmeyer and coworkers in their analysis had used the VAs administrative database for risk adjustment. When the NSQIP data on the same patients were analyzed a few years later, no relationship between volume and outcome could be elicited [8]. Likewise, the New York state study used by Birkmeyer and colleagues in their analysis was also supplanted a few years later by another study from the same group that now also showed no relationship between volume and outcome of coronary artery bypass grafting [9]. What is most alarming about this exercise is not that the volume standards set for the surgical community by the Leapfrog Group were ill-grounded scientifically, but that we now depend on Microsoft and General Motors to set our standards of carea point which I underscored in an editorial that accompanied the article by Birkmeyer and associates. Surgeons, and only surgeons, need to set standards of surgical care, and for that they will need an NSQIP.
Many have said that the NSQIP would not work in the private sector where there would be no central authority to mandate it, where it would be expensive to hire dedicated nurses for data collection, and where the patient population and the predictive models would be different. It was not by any means a central mandate that brought about the NSQIP. In fact, the NSQIP in its first 4 years was fiercely fought by some senior managers in the VA who almost succeeded in killing it. It was the chiefs of surgery in the field who willed it and made it happen because they had realized, as a group, that they would not be able to advocate for themselves nor withstand the onslaught of byzantine policies imposed on them without proper dataa realization which Fred Grover tells me has become a driving force for The Society of Thoracic Surgeons database as well. Of course there were many chiefs of surgery in the VA who were initially skeptical and viewed this as "Big Brother" breathing down their throat. The argument that won these skeptics was an argument that is very apropos to the surgical community as a whole today: if we, ourselves, do not do this, somebody else will do it for us, and you can be sure they will not do it better. If the will is present in the private sector, we have enough professional organizations in surgery that can provide the necessary mandate.
Is the NSQIP too expensive to be applied in the private sector? It is not. The total annual expenditure of the program, including the salaries of the nurses in the 128 participating hospitals, is less than $5 million, averaging $38 per major operation assessednearly the cost of two 7-0 Prolene sutures! More important than cost in a program of this nature is perceived value. Providers will partake in an NSQIP only if they find value in it. Chiefs of surgery in the VA have found value in obtaining reliable comparative data that would characterize the quality of their performance, and enable them to evaluate and improve the systems of care at their local facilities. There is value in the NSQIP because, unlike in industry and manufacturing where quality generally costs more in health care, quality costs less because it prevents costly morbidity. One of the interesting current studies of the NSQIP is an assessment of the savings to the VA that have been realized by a 47% decrease in morbidity during the 10 years since the inception of the NSQIP. We estimate it to be in the billions of dollarscertainly much more than what the VA has already spent on the NSQIP. A program that is designed to and can effectively improve quality of surgical care cannot be too expensive.
To answer the question of whether the NSQIP predictive models were applicable to non-VA populations, an NSQIP Private Sector Initiative (PSI) was started more than 2 years ago, involving three non-VA institutions: the departments of surgery at the University of Michigan in Ann Arbor, the University of Kentucky in Lexington, and Emory University in Atlanta. A dedicated nurse was trained at each of these facilities, and new software was developed that allowed the nurses to collect and transmit the data through the Internet, the data collection instrument itself being identical to the one used in the VA, but limited to general and vascular surgery. After a year and a half of data collection, predictive models were built based on the VA data alone, the PSI alone, all the data combined, and on the VA top 10 predictors only. All these models had high C-indices indicating excellent predictability. When the VA top 10 predictors model, the simplest, was applied to the PSI data, it yielded a C-index of 0.95, as high as that of the model based on the PSI data alone. Considering that 1.0 is perfect predictability, these results clearly indicated that the VA models were very applicable to the patient populations of these three non-VA medical centers. Encouraged by these results, the NSQIP partnered with the American College of Surgeons, and together we recently secured a $5.2 million grant from the Agency for Healthcare Research and Quality to investigate in the 128 VA surgical services and 10 additional private sector institutions the efficacy of the NSQIP as a reporting system to improve patient safety in surgery. One of the main objectives of this study is to explore further the applicability of the NSQIP to the private sector, in the hope of opening up the NSQIP in the future to he surgical community at large.
The NSQIP will be applicable to the private sector, as a comprehensive tool for the assessment and improvement of the quality of care in all of surgery, if, and only if, it retains in the private sector its most essential ingredient: the trust of the surgeon in the field and his or her pride in its accomplishments. When a system like the NSQIP gains the trust of the surgeons as a means by which they can reliably identify and study the strengths and weaknesses in the quality of the care they deliver, they are much more likely to participate in it and much less likely to game it than a system bent on providing an audit primarily to identify high outliers and poor performers. The surgical community does not need another audit system. It needs a trustworthy outcome-based data-driven quality improvement program.
Conclusion: the common thread
In conclusion, I have presented to you this morning the NSQIP as the first national, validated, outcome-based, risk-adjusted, and peer-controlled state-of-the-art program for the measurement and enhancement of the quality of surgical care. I have shown you how this system evolved from the need of VA surgeons in the mid-1980s to advocate for themselves against disenfranchisement and adverse policy. I have drawn a parallelism between what VA surgeons faced then and what the surgical community at large is facing todaywhile trying to underscore the need for the surgical community as a whole to partake in a valid, truly national system for measuring and enhancing the quality of surgical careby surgeons, for surgeons. Only with such a system will we be empowered to shape our destiny and the fate of our profession.
Finally fellow colleagues, we will never be able to measure reliably the quality of surgical care, or advocate effectively for our profession and against adverse healthcare policies without the common denominatorthe thread that weaves through them all: reliable data. Last year, Woodrow Myers, MD, then Director of Health Care Management of the Ford Motor Company, addressed members of the American Board of Surgery during a retreat dedicated to a discussion of the measurement of surgeon competence. After presenting the data analyses that formed the basis for the referral of Ford employees to specific healthcare providers, he must have guessed what most of us in the audience were thinking, and said, "Some of you are saying to yourselves, but these are flawed data! Yes, they are, in part, flawed data, but we will continue to use flawed data until you, the surgical community, provide us with better data." It never ceases to amaze me the extent to which we, as a profession, are willing to go to ensure the reliability and accuracy of the data we submit to basic peer-reviewed journals every day, yet we are mostly oblivious to the quality of the data that actually determine our livelihood and the very nature of our profession. It is not enough to view data as "medicines new weapon," as Business Week put it. Only reliable data are medicines new weapon. Unreliable data are a weapon that has and continues to hurt us immeasurably as surgeons and healthcare professionals.
Thank you, President Orringer and fellow colleagues, for giving me the splendid honor of delivering the 2002 Thomas B. Ferguson Lecture.
References
This article has been cited by other articles:
![]() |
B. D. Kozower, G. J. Stukenborg, C. L. Lau, and D. R. Jones Measuring the Quality of Surgical Outcomes in General Thoracic Surgery: Should Surgical Volume Be Used to Direct Patient Referrals? Ann. Thorac. Surg., November 1, 2008; 86(5): 1405 - 1408. [Full Text] [PDF] |
||||
![]() |
B. Keogh, D. Spiegelhalter, A. Bailey, J. Roxburgh, P. Magee, and C. Hilton The legacy of Bristol: public disclosure of individual surgeons' results BMJ, August 21, 2004; 329(7463): 450 - 454. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |