|
|
||||||||
Ann Thorac Surg 2002;73:2005-2011
© 2002 The Society of Thoracic Surgeons
a Neuropsychology Laboratory, Mental Health Research Institute of Victoria, Parkville, Australia
b Behavioural Neurology Laboratory, Mental Health Research Institute of Victoria, Parkville, Australia
c Center for Neuroscience, The University of Melbourne, Parkville, Australia
d School of Psychological Science, La Trobe University, Bundoora, Australia
e Centre for Anesthesia and Cognitive Function, St. Vincents Hospital, Melbourne, Victoria, Australia
* Address reprint requests to Dr Collie, Neuropsychology Laboratory, Mental Health Research Institute of Victoria, Locked Bag 11, Parkville, Victoria 3052, Australia
e-mail: alex{at}neuro.mhri.edu.au
| Abstract |
|---|
|
|
|---|
| Introduction |
|---|
|
|
|---|
Conventionally, statistical comparisons in coronary surgery studies are made using the 1-standard deviation (SD) method or standard deviation index (SDI), where the individual is said to have significant cognitive decline if their postoperative cognitive test score is worse than their preoperative score by more than 1 SD of an appropriate reference group [6, 7]. Another statistical technique employed commonly is the 20% change method, where significant cognitive decline is said to have occurred if postoperative test score deteriorates by greater than 20% of the preoperative score [6, 8], or on more than 20% of tests administered. A large number of studies have used these analytic techniques, leading to their general acceptance as a de facto standard for determining the significance of change in cognitive test score after cardiac surgery (for review, see reference 9). Multiple publications have utilized these methods, including a recent article in the New England Journal of Medicine [2]. Both the SDI and 20% change methods have important shortcomings that may lead to false conclusions (discussed below). This situation has arisen despite the availability of alternative and demonstrably superior statistical techniques for determining the significance of change in cognitive test score (Table 1). While many of these alternative techniques have been employed in the neuropsychological and psychiatric literature for at least a decade, only a small number of studies have investigated their validity for determining the incidence and extent of cognitive decline after cardiac surgery [10, 11]. Further, only one of these alternative techniques was discussed in an otherwise elegant review of methodological issues associated with detecting change in cognitive status after cardiac surgery [12].
|
The aim of the present article is to describe the statistical techniques that have been developed in the neuropsychological and psychiatric literature for differentiating "true" change in test score from measurement error, practice effects, and regression to the mean (Table 1). These techniques differ according to whether they may be applied to groups or to individuals, whether they require appropriate control data, and whether they account for the potential effects of assessment-related factors as described above. This article first briefly describes the issues associated with serial cognitive assessment, critically reviews a number of statistical techniques in relation to these issues, and then summarizes the results of some recent studies that have compared the clinical utility of these techniques. A further aim is to compare the SDI and percent change methods commonly employed in the field of cardiac surgery to other available statistical techniques. The application of these statistical techniques to cognitive test data is addressed, rather than their mathematical or theoretical aspects. It is hoped that this review will lead to the consideration of more appropriate analytical techniques in future investigations of cognitive outcomes after cardiac surgery.
| Issues associated with serial cognitive assessment |
|---|
|
|
|---|
Regression toward the mean is a statistical term that describes the phenomenon whereby an extreme test score derived from an individual at one assessment tends to revert toward the mean of the group of which that individual is a member at a follow-up assessment [16]. Thus, the test score of an individual who performs highly at one assessment is likely to decline at a subsequent assessment, while the test score of an individual who scores poorly at one assessment is likely to improve at a subsequent assessment, without any intrinsic change in that individuals specific abilities. Regression to the mean may therefore confound interpretation of cognitive data when the SDI and percent change methods are employed. Specifically, individuals whose preoperative test score is better than the group mean (ie, extreme) are more likely to be rated as having cognitive decline postoperatively than individuals with average or poor preoperative performance [3]. For example, the effects of regression to the mean were evident in a recent study by Newman and colleagues [2], who reported that a significant predictor of postoperative cognitive decline after CABG was a high preoperative cognitive score. The magnitude of regression to the mean is exacerbated when the test used to rate cognitive status has poor reliability, as greater amounts of measurement error result in greater regression to the mean.
As mentioned above, an individuals cognitive test score may improve with repeated assessment purely due to increased test familiarity (ie, practice effects). The magnitude of these practice effects may be modulated by the length of the test-retest interval, as longer intervals result in reduced practice effects and vice versa [17, 18]. Conventionally, investigators have adopted alternate forms of a test in order to reduce the magnitude of practice effects. However, practice effects are still observed in studies where alternate forms of the same test have been used [13]. The concurrent assessment of an appropriate control group, or the availability of control group data, is therefore essential for accurate clinical decision making. However, serially acquired normative data collected at clinically relevant testing intervals are rare for most cognitive tests. Erroneous statistical analysis may be a consequence of failure to employ an adequate control group. For example, one recent coronary surgery study [19] employed a 0.5-SD criterion for cognitive decline after CABG (compared with the conventional 1 SD) on the basis of longitudinal cognitive test data described by Mitrushina and Satz [20], who reported that the average effect of practice on cognitive tests was of the order of 0.5 SD. These authors assumed that an observed postoperative decline of 0.5 SD was therefore equivalent to a "true" decline of greater than 1 SD when the practice effect was taken into account (ie, 0.5-SD decline = 1-SD decline - 0.5-SD practice improvement). However, the control data used in this study are inadequate, being from an entirely different sample to that under study (ie, coronary surgery patients). Other methodological differences between these studies may also have affected the magnitude of the practice effect, and therefore the accuracy of any conclusions. For example, there were no intermediate assessments in the 1-year test-retest interval employed by Mitrushina and Satz [20], while the CABG patients were reassessed at 1 and 12 months [19]. The intermediate assessment may have acted to increase the magnitude of the practice effect in the CABG study, which in this case would have led to underestimation of the incidence of cognitive decline after coronary surgery.
Another common strategy for controlling practice effects is to adopt a dual baseline, and use the second assessment as the "true" baseline for subsequent comparison. This strategy assumes that practice effects operate only between the first and second administrations of a test; however, the validity of this assumption has not been validated in any systematic manner. The dual-baseline approach also fails to account for other factors, such as regression to the mean, which will continue to operate between any two assessments (eg, second baseline and postoperative assessments), regardless of the number of preceding assessments. Even this brief review reveals that there are few adequate methodological strategies for reducing error in test-retest studies. This has led to the development of statistical approaches that attempt to partial out "true" changes in cognitive test score due to an independent variable (eg, brain damage) from artificial or test-related changes and measurement error.
| Statistical techniques for determining the significance of change in cognitive test score |
|---|
|
|
|---|
Reliable change index
Reliable Change Indices (RCI) are calculated by dividing the individuals test-retest difference score by the standard error of that difference score (SEdiff [21]). In turn, the SEdiff can be calculated from the standard error of measurement (SEM). The "SEdiff describes the range of the distribution of change scores that would be expected if no actual change had occurred" [21]. RCIs are usually regarded as standardized scores, and therefore, an RCI larger than 1.96 will occur in less than 5% of cases. Advantages of RCI include that they account for test reliability at both baseline and follow-up assessments, thus allowing for regression to the mean. That is, the less reliable the test, the greater the test-retest difference score required for a significant change. Also, this RCI may be calculated for individuals without reference to control group data. However, the standard RCI does not correct for the effects of measurement error due to practice or other confounding variables. This requires manipulation of the numerator and has led to the application of modified RCIs.
Modified reliable change indices (MRCI1 and MRCI2)
In modified RCIs, a constant is placed in the numerator to reflect the extent of change expected to occur as a result of a confounding variable, or some alteration is made to the denominator to compensate for measurement error. An example is the RCI described by Chelune and associates [22], in which the RCIs calculated for epilepsy patients undergoing temporal lobectomy were corrected for the effects of practice, by first calculating the magnitude of the practice effect in a group of matched but nonsurgical epilepsy patients, and then adding this value to the numerator in the RCI (see Table 1). Although this method has the advantage of the practice effect correction, it also requires that data be available for an appropriate control group at a similar test-retest interval. Unfortunately, such data are rarely available in clinical settings. Another modified RCI is that described by Zegers and Hafkenscheid [23], who suggest replacing the raw change score in the Chelune and associates [22] equation (numerator) with an estimated "true" change score, and the standard error of the difference (denominator) with an estimated "true" standard error of the difference (formula in Table 1). Although this RCI provides further correction for measurement error, it requires that control data be available. Furthermore, it also requires knowledge of the reliability of the cognitive test in an appropriate control group. These modified RCIs may also be limited by their use of control group data to correct for individual practice effects, as prior research suggests that the magnitude of practice effects may vary considerably between individuals [24].
Reliability-stability index
The reliability-stability index (RSI) described by Bruggemans and colleagues [10] subtracts the output from the RCI of Zegers and Hafkenscheid [22] described above from a modified version of this RCI where the individual patients baseline and retest scores are replaced by the mean baseline and retest scores of a small (approximately n = 10) appropriate matched control group. More simply, the RSI of Bruggemans and colleagues represents the RCI of a matched control group subtracted from the RCI of the individual patient. Data are treated as with previously described RCIs. Being a combination of previously described RCIs, this method corrects for measurement error, individual variability, and practice effects. However, as with all other RCIs, it is not applicable to the single case, as control and appropriate test-retest reliability data are required.
Simple regression
In the simple regression method described by McSweeney and colleagues [14], a linear equation is calculated on the basis of the mean baseline and retest data from the groups of subjects described by Chelune and associates [22]. This equation is then applied to individual patients baseline data and a predicted retest score is obtained. The difference between the predicted and observed retest score is then divided by the standard error of the estimate (SEest) of the control group regression equation. McSweeney and colleagues identified a significant change at retesting when this value was greater than a certain criterion. Although this method allows for both practice effects and individual variability, the raw statistic does not adequately control for measurement error. That is, like the SDI and SEM techniques described above, the simple regression method incorrectly assumes that the baseline score is perfectly reliable (ie, free from measurement error). This may be remedied by including some estimate of reliability in the regression equation.
Multiple regression
The multiple regression method described by Temkin and colleagues [25] is similar to the simple regression method of McSweeney and associates [22], in that predicted retest scores are obtained on the basis of mean baseline data collected in a group of subjects. The multiple regression method is, therefore, subject to the same limitations as the simple regression method. However, multiple regression predictions of performance attempt to account for many sources of individual variability (eg, age, level of education, gender) by including variables defining the influence of these factors in the regression equation (Table 1). The major advantage of this method is that it allows the derivation of predicted scores (and subsequently decisions regarding the normality of observed predicted-obtained score differences) for individuals of different ages, levels of education, gender, etc. However, large groups are required to formulate accurate multiple regression estimates of change.
| Examples from published reports |
|---|
|
|
|---|
Bruggemans and colleagues [10] applied six statistical techniques to test-retest cognitive data acquired from 63 patients undergoing CABG. These included the SDI, the RCI, the MRCI1 and MRCI2, the RSI, and simple regression model. Control data were collected from the spouses of the cardiac patients. These authors determined the deterioration rates in the CABG patients for each cognitive test administered according to each of these models. Techniques that correct for practice effects were observed to provide the best estimates of deterioration rates when test-retest reliability was high and when large practice effects were observed in the control group. In contrast, techniques that correct for measurement error (and therefore regression to the mean) were observed to operate best when test-retest reliability was low. These results indicate that the psychometric properties of the cognitive test employed may need to be considered when selecting a method to determine the clinical significance of an observed change in an individual performing that test.
Arndt and colleagues [26] compared the ability of simple change, simple regression, taua, taub, and the nonparametric slope (among other measures) to measure the symptom course of patients with schizophrenia and affective disorders. These authors used measures of effect size, statistical power, and Type 1 error rates derived from data sets submitted to bootstrapping techniques as their outcome variables. The ability of these change methods to detect correlations between symptom course and independent variables (eg, age, gender) was also determined. Both Kendalls tau methods provided acceptable estimates of symptom course and provided the greatest statistical power to detect any change in course. Both tau measures were able to detect correlations with independent variables, and also recorded acceptable Type 1 error rates (approximately 5%). This important study highlights the advantages of an alternative approach to those conventionally considered in the neuropsychological literature.
| Summary and conclusions |
|---|
|
|
|---|
It should be clear from this review that in research studies, including those of cognitive change subsequent to coronary surgery, the selection of a statistical technique to determine change in cognitive status must be made on the basis of the psychometric properties of the tests used, but also with respect to the methodological design of the individual study. For example, analysis of the psychometric properties of cognitive tests commonly used in CABG research suggests that the SDI is inappropriate for determining the significance of change in test score (Table 2). This was partially acknowledged very recently by Murkin [28], in an editorial where the RCI method was put forward as a possible alternative or adjunct to the SDI and percent change methods. However, we propose that the RCI method will not always be appropriate and that other statistical techniques should be given due consideration.
|
We have summarized here the statistical techniques currently employed in the neuropsychological literature to differentiate "true" change in test score from change due to measurement error and practice effects. Although some techniques perform quite well and facilitate accurate clinical decisions, others fail to adequately account for possible confounding factors. These techniques may be differentiated by the extent to which they account for measurement error, regression to the mean, and practice effects. The development of the more complex and competent techniques was initiated because of the unreliability and error inherent in many conventional cognitive tests. More accurate assessment of cognitive function in CABG research may be gained through implementation of these techniques to data gained from cognitive tasks that allow accurate serial assessment.(27)
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
P. Maruff, E. Thomas, L. Cysique, B. Brew, A. Collie, P. Snyder, and R. H. Pietrzak Validity of the CogState Brief Battery: Relationship to Standardized Tests and Sensitivity to Cognitive Impairment in Mild Traumatic Brain Injury, Schizophrenia, and AIDS Dementia Complex Arch Clin Neuropsychol, March 25, 2009; (2009) acp010v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Hernandez Jr, J. R. Brown, D. S. Likosky, R. A. Clough, A. L. Hess, R. M. Roth, C. S. Ross, C. M. Whited, G. T. O'Connor, and J. D. Klemperer Neurocognitive Outcomes of Off-Pump Versus On-Pump Coronary Artery Bypass: A Prospective Randomized Controlled Trial Ann. Thorac. Surg., December 1, 2007; 84(6): 1897 - 1903. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Vardy, S. Rourke, and I. F. Tannock Evaluation of Cognitive Function Associated With Chemotherapy: A Review of Published Studies and Recommendations for Future Research J. Clin. Oncol., June 10, 2007; 25(17): 2455 - 2463. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. R. Keith, D. J. Cohen, and L. B. Lecci Why Serial Assessments of Cardiac Surgery Patients' Neurobehavioral Performances are Misleading Ann. Thorac. Surg., February 1, 2007; 83(2): 370 - 373. [Full Text] [PDF] |
||||
![]() |
S. Al-Ruzzeh, S. George, M. Bustami, J. Wray, C. Ilsley, T. Athanasiou, and M. Amrani Effect of off-pump coronary artery bypass surgery on clinical, angiographic, neurocognitive, and quality of life outcomes: randomised controlled trial BMJ, June 10, 2006; 332(7554): 1365. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. Lewis, P. Maruff, B. S. Silbert, L. A. Evered, and D. A. Scott Detection of Postoperative Cognitive Decline After Coronary Artery Bypass Graft Surgery is Affected by the Number of Neuropsychological Tests in the Assessment Battery Ann. Thorac. Surg., June 1, 2006; 81(6): 2097 - 2104. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. D. Raymond, A. D. Hinton-Bayre, M. Radel, M. J. Ray, and N. A. Marsh Assessment of statistical change criteria used to define significant change in neuropsychological test performance following cardiac surgery Eur. J. Cardiothorac. Surg., January 1, 2006; 29(1): 82 - 88. [Abstract] [Full Text] [PDF] |
||||
![]() |
G H Maassen and A Hinton-Bayre Reliable change assessment in sport concussion research: a comment on the proposal and reviews of Collie et al * Commentary Br. J. Sports Med., August 1, 2005; 39(8): 483 - 488. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. D. Hanning Postoperative cognitive dysfunction Br. J. Anaesth., July 1, 2005; 95(1): 82 - 87. [Full Text] [PDF] |
||||
![]() |
R. Motallebzadeh and M. Jahangiri Benefits of off-pump bypass on neurologic function Ann. Thorac. Surg., September 1, 2004; 78(3): 1131 - 1132. [Full Text] [PDF] |
||||
![]() |
B. S. Silbert, P. Maruff, L. A. Evered, D. A. Scott, M. Kalpokas, K. J. Martin, M. S. Lewis, and P. S. Myles Detection of cognitive decline after coronary surgery: a comparison of computerized and conventional tests Br. J. Anaesth., June 1, 2004; 92(6): 814 - 820. [Abstract] [Full Text] [PDF] |
||||
![]() |
A Collie, P Maruff, M McStephen, and D Darby Are Reliable Change (RC) calculations appropriate for determining the extent of cognitive change in concussed athletes? Br. J. Sports Med., August 1, 2003; 37(4): 370 - 372. [Full Text] [PDF] |
||||
![]() |
D. Whitaker The use of Z scores in assessing neuropsychological change after cardiac operations Ann. Thorac. Surg., March 1, 2003; 75(3): 1066 - 1066. [Full Text] [PDF] |
||||
![]() |
A. Collie, D. G. Darby, P. Maruff, and B. S. Silbert The use of Z scores in assessing neuropsychological change after cardiac operations: Reply Ann. Thorac. Surg., March 1, 2003; 75(3): 1066 - 1067. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |