|
|
||||||||
Ann Thorac Surg 2005;79:1104-1109
© 2005 The Society of Thoracic Surgeons
a Department of Biostatistics, The Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland
b Department of Neurology, The Johns Hopkins University School of Medicine, Baltimore, Maryland
c Department of Surgery, The Johns Hopkins University School of Medicine, Baltimore, Maryland
e Department of Neuroscience, The Johns Hopkins University School of Medicine, Baltimore, Maryland
d Zanvyl Krieger Mind Brain Institute, The Johns Hopkins University, Baltimore, Maryland
* Address reprint requests to Ms Barry, Department of Biostatistics, The Johns Hopkins School of Public Health, 615 N Wolfe St, Baltimore, MD 21205 (E-mail: sbarry{at}jhsph.edu).
| Abstract |
|---|
|
|
|---|
We use an "analyze then summarize" approach whereby we estimate the intervention effects separately for each cognitive test and then pool them, taking appropriate account of their statistical correlations. The model accounts for dropouts at follow-up, the chance of which may be related to past cognitive score, by implicitly imputing the missing data from individuals' past scores and group patterns. We apply this approach to a study of the effects of CABG on the time course of cognitive function as measured by 16 separate neuropsychological test scores, clustered into 8 cognitive domains. The study includes measurements on 140 CABG patients and 92 nonsurgical controls at baseline, and at 3, 12, and 36 months.
Our "analyze then summarize" method allows us to identify differences between the treatment groups in individual tests as well as in aggregate measures. It takes into account the correlation structure of the data and thereby produces more precise results than summarizing before analyzing.
The methods used have application to a wide range of intervention studies in which multiple biomarkers are followed over time to quantify health effects. Software to implement the methods in the R statistical package is available from the authors at http://www.biostat.jhsph.edu/
sbarry/software/ATSrcode.pdf.
| Introduction |
|---|
|
|
|---|
We present an "analyze then summarize" approach in which tests are analyzed individually before summarization of the results of these initial analyses, taking into account correlation within people over time and across tests. This method allows us to study both the individual and aggregate results.
As an illustration of our methodology, we will apply it to data from an ongoing study that allows comparison of patients receiving CABG with a group of patients who have established coronary artery disease, but do not have surgery; these nonsurgical controls (NSC) have an incidence of risk factors for vascular disease similar to that of the CABG group. In the accompanying paper by Selnes and colleagues [3], we compare the longitudinal performance of these two groups at baseline and at 3, 12, and 36 months after surgery or enrollment.
A first question in this example is whether the pattern of cognitive change in the CABG group differs from that observed in the NSC group. A second question is whether any differences are likely caused by the surgery.
In this paper, we discuss a hierarchical statistical model [4] that can be used to quantify differences in change in cognitive function over time between the CABG and control groups. We use the statistical model to estimate the average cognitive function performance on each test over time for the surgery and control groups, after adjusting for known differences in potential confounding variables, specifically age, sex, education, and the presence of symptoms of depression.
Finally, we use bootstrapping [5] to combine the estimates of the surgery effects across many measures into domain-specific estimates of group differences. This method uses prior knowledge about the domains of cognitive function measured by each test as well as the correlation structure of the data.
The statistical approaches we have used for evaluation of prospective, longitudinal data comparing patients after CABG with a nonsurgical control group have general applicability to other clinical studies in which there are multiple outcomes and the goal is the evaluation of the impact of an intervention.
| Analyze, Then Summarize |
|---|
|
|
|---|
Analyze
STUDY DESIGN
The data used to illustrate our method come from an observational study of 140 patients undergoing CABG and 92 nonsurgical cardiac controls. Surgical patients (CABG) and nonsurgical controls were recruited from September 1997 through March 1999 at the Johns Hopkins Cardiac Unit. The NSC group was identified by Johns Hopkins cardiologists as potential patients who were diagnosed with coronary artery disease by cardiac catheterization.
Study participants were administered a battery of standardized neuropsychological tests at baseline and at 3, 12, and 36 months. Patients were also administered the Center for Epidemiological Studies Depression scale (CES-D) at baseline and follow-up [6], in order to adjust cognitive test scores for possible effects of depressed mood. (See the accompanying paper by Selnes and coworkers [3] for a detailed description of the patient population and study design.)
HIERARCHICAL LINEAR STATISTICAL MODEL
This section describes a now standard statistical model [4] designed to capture the key components of the change in cognitive function over time for the individuals in our study and for their population, and to compare the typical change for persons who do and do not receive an intervention such as CABG. As an example, we focus on a single cognitive domain, verbal memory, that constitutes four tests; below we present a method for summarizing results across tests to obtain the domain values.
The model is specified by the following assumptions: (1) Each person has a unique level and time trend of cognitive function. (2) Over periods of time, such as a few years, true cognitive function changes gradually and can be approximated by a smooth function of time, such as a low order polynomial [7]. (3) The intervention may affect people in the short term by immediately increasing or decreasing their function; and over the longer term by changing their preintervention trend. The short-term and long-term effects of intervention may vary across individuals. (4) The level of cognitive function is influenced, possibly in a nonlinear way, by other factors such as age, sex, education, and level of depressive symptoms.(5) Measurements of cognitive function are subject to a practice effect whereby a study participant's scores on quantitative tests could improve with repetition, particularly from the first to second testing, absent a change in actual cognitive function level.
Figure 1 presents a schematic of this model. The goal is to estimate the effects of an intervention from a dataset comprising repeated observations on cognitive tests over time for persons who have received the intervention and other similar persons who have not.
|
The proposed model has two degrees of freedom to quantify a possible effect of CABG: the rise from 0 to 3 months (short-term, or learning effect); and a difference in the slope from 3 to 36 months (long-term, or time trend effect). We use a Wald test of the null hypothesis of no CABG effect, which simultaneously tests whether the two regression coefficients are equal to zero by assuming that the coefficients are approximately normally distributed in large samples and comparing the relevant test statistic to the
2 distribution [8].
When dropout is related to past cognitive score, individuals seen at all follow-up points may have a different distribution of scores to the entire group, and this is likely to cause bias. The model has the ability to reduce this bias by using information from previous timepoints and group patterns to internally impute missing data at later follow-up points and thereby make more precise estimates of the true group means at follow-up of all the individual patients who started the study.
ESTIMATING NATURAL HETEROGENEITY
In addition to estimating the mean curves for each intervention group, the model is used to estimate the amount of variation in the true levels and time trends of a cognitive test score among persons within groups [7]. We anticipate some variation in the baseline scores and change over time between people purely by reason of individual persons being heterogeneous; therefore, we expect scores from a particular patient over time to be more similar than those between people. The model takes into account and estimates this correlation among repeated observations on an individual person. We allow the degree of variation to be estimated separately for the two intervention groups to capture any extra variation that may arise in the treatment group as a result of the intervention differentially benefiting or harming subjects.
Summarize
The hierarchical model in the Appendix is estimated separately for each of the 16 measures of cognitive function. This produces, for each measure, short- and long-term intervention effect estimates from the learning effect and time trend group differences and a 2 x 2 covariance matrix that quantifies their statistical error. We estimate the mean learning or time trend "effect" for a domain as the weighted mean of the effects for the tests in that domain, where the weights depend on the correlation structure. Since the multiple test scores for an individual are correlated with one another, estimates of the CABG effects for the different measures are also correlated. To correctly estimate the standard errors of these domain-specific or overall effects, we must take this correlation into account. We use bootstrapping [5], a statistical method that involves resampling individuals, to estimate the joint covariance matrix among the 16 pairs of intervention effect estimates and to obtain valid standard errors for the domain and overall intervention effect estimates. We draw with replacement a random sample of 140 CABG and 92 NSC subjects, refit all 16 models to get test-specific learning and time trend effect estimates, average these to obtain domain and overall effect estimates, and then repeat this process 1,000 times. The variance among the 1,000 bootstrapped replicates of the domain and overall effect estimates gives a valid estimate of statistical uncertainty, used to calculate the confidence intervals of the effect estimates, as it takes appropriate account of the correlation among multiple cognitive test scores for the same individual person.
| Results |
|---|
|
|
|---|
Figure 2 is a "spaghetti" plot of the standardized and covariate-adjusted scores over time for the Verbal Memory domain stratified by intervention group. The cognitive test scores were standardized such that the NSC group had a mean score of zero and standard deviation of 1 at baseline, and were adjusted for age, sex, education level, and depressive symptoms. The Verbal Memory domain is made up of the total score, delayed recall (Trial 8), retention score, and corrected recognition from the Rey Auditory Verbal Learning Test [12]. The group mean scores at each time are also shown. For graphical simplicity the plot is drawn using the 3-, 12-, and 36-month time points rather than the actual times at which patients were tested. The model may be fitted using the actual times if desired, but since our testing times were close to 3, 12, and 36 months, the assumption of these exact times is minor. A learning effect is evidenced by the increase in mean score in both groups from baseline to 3 months. There is little change from 3 to 36 months in the mean response for either group after the initial rise.
|
|
Panel C is a plot of the difference between the intervention (CABG) and control group (NSC) curves in panel B at each time. These differences are the essential evidence relevant to assessing the intervention (CABG) effect, as they show the disparities between the cognitive scores of the group that has had surgery and a group with similar risk factors that has not. Without this comparison, we cannot attribute any change in scores for the CABG group to the surgery. They still require some adjustment, however, because they are averages of observed data and do not take account of the fact that the number of dropouts may differ between the two groups.
Panel D presents improved estimates of the difference at each time between the CABG and NSC group curves shown in panel C. The results in panel D are obtained from the hierarchical model in the Appendix. The small differences in the curves in panels C and D reflect changes from taking appropriate account of missing data. In the case of this domain, the CABG patients remaining at 36 months are those with poorer cognitive scores at baseline than the patients who dropped out, with the opposite effect in the NSC group, a phenomenon likely to bias the mean group score at 36 months toward lower values. The hierarchical model implicitly imputes the missing data by using the information for the individual patients at previous times and the patterns for their group. The means in panel C ignore the missing data and are biased unless the chance of dropping out is unrelated to past cognitive score, which is an unlikely situation.
Panel D shows evidence of a difference in population mean Verbal Memory value between the intervention and control groups, with the CABG group having a greater improvement from baseline than the NSC group, both in the short term at 3 months and in the long term at 36 months. The test of the null hypothesis that both the short- and long-term effects are zero has a p value of 0.01, indicating that this hypothesis can be rejected in favor of the CABG group.
Figure 4 shows the estimated difference in the cognitive function time course between the intervention and control groups (panel D in Fig 3 for Verbal Memory) for all 8 of the cognitive domain measures. The accompanying paper by Selnes and associates [3] presents the corresponding figure for the 16 cognitive subtests. The p values on the plots in Figure 4 again result from tests of zero difference in the trends over time between the groups and make it clear that in this dataset, there is little or no evidence consistent with a detrimental effect of CABG on cognitive function as measured by these 8 scores.
|
| Comment |
|---|
|
|
|---|
The model described here estimates the average difference between the CABG and control groups in the change in cognitive function from baseline. We use the model to adjust for baseline differences between the groups in test scores and for differences over time that are attributable to demographics and depression symptoms, the latter as measured by the CESD, as well as natural heterogeneity between people.
However, no model can adjust for unmeasured differences between the two groups that are more likely to arise in observational studies where subjects choose their treatment in consultation with their physician rather than having it assigned by a known, random mechanism. Hence, we must be cautious in our interpretation of the evidence, asking what other factors might account for the differences or lack thereof between the two groups.
The hierarchical model allows one to take appropriate account of dropouts, a common phenomenon in longitudinal studies such as this one. The model includes terms that acknowledge the correlation among repeated observations for each individual. Having done so, it can internally impute missing values by predictions based upon the earlier responses and other covariates [7]. Failure to use a model that accounts for within-person correlation can lead to biased estimates of treatment effects except when the dropout process is independent of the past responses, which is unlikely. A limitation of the method, however, is the assumption that patients would have followed their initial trend since most of the missing data are at the 36-month follow-up point.
We have presented an approach to the difficult problem of how to estimate the effect of CABG on the performance of 16 cognitive measures by first analyzing each of them separately with an hierarchical model and then pooling the effect estimates to obtain domain effects. We refer to this as the "analyze then summarize" approach. An alternative is to "summarize then analyze" the data by using factor analysis [10] or some other method to create summary scores from the 16 test results and then to use hierarchical models with the summary measures. We prefer our approach because it produces a separate treatment effect for every measure so that unanticipated patterns can be discovered. An example of the serious limitations of the "summarize then analyze" approach could occur when combining the results of two tests; one of which shows a decline over time in the treatment group while the other shows an improvement, but for both tests the control group remains stable. Summarizing before analyzing would allow the treatment effects to be canceled out, and we would have no way of identifying them.
Another approach to this problem is repeated measures analysis of variance (ANOVA). For this problem, we prefer our method because repeated measures ANOVA includes only a random intercept and therefore has a less flexible model of the covariance structure among the repeated observations [7]. A further, commonly used approach to the problem involves the use of "SD" methods, whereby patients are classified as having cognitive decline if they, for example, are 1 standard deviation worse at follow-up in 20% of the tests than they were at baseline [11]. We believe, however, that purely by chance some patients will do worse at follow-up than at baseline and likewise some patients will improve. To consider only those who are declining at follow-up misses important information about patients who are improving and who on average may balance out the amount of change the groups are showing. This method also fails to quantify the type of correlation between repeated observations that can be of interest in its own right.
Furthermore our method avoids the difficult problem of how to choose the best summary scores. Typically, summarization is based upon the correlation among test results at one time and does not take appropriate account of the longitudinal information.
The methods used here have wide application to a variety of longitudinal studies comparing intervention groups or groups defined in other ways when there are multiple outcomes. To facilitate the application of these methods, software to implement the analyses presented here has been posted to our Webpage (available at: http://www.biostat.jhsph.edu/
sbarry/software/ATSrcode.pdf).
| Appendix |
|---|
A. Cognitive function = Intercept + covariate effects at baseline + test-retest improvement for second test and beyond + linear time trend.
B. Intercepts differ between treatment groups; test-retest improvements differ between treatment groups; linear trends differ between treatment groups.
C. Subjects vary witin treatment groups in their true: intercepts; time trends.
D. The degree of subject-to-subject variation in the intercepts and slopes (in C.) differs between the two treatment groups.
R program code specifying this model is available on the web at: http://www.biostat.jhsph.edu/
sbarry/software/ATSrcode.pdf.
| Acknowledgments |
|---|
|
|
|---|
| References |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
O. A. Selnes, M. A. Grega, M. M. Bailey, L. Pham, S. Zeger, W. A. Baumgartner, and G. M. McKhann Neurocognitive Outcomes 3 Years After Coronary Artery Bypass Graft Surgery: A Controlled Study Ann. Thorac. Surg., December 1, 2007; 84(6): 1885 - 1896. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. J. Nathan, R. Rodriguez, D. Wozny, J.-Y. Dupuis, F. D. Rubens, G. L. Bryson, and G. Wells Neuroprotective effect of mild hypothermia in patients undergoing coronary artery surgery with cardiopulmonary bypass: Five-year follow-up of a randomized trial J. Thorac. Cardiovasc. Surg., May 1, 2007; 133(5): 1206 - 1211. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. R. Keith, D. J. Cohen, and L. B. Lecci Why Serial Assessments of Cardiac Surgery Patients' Neurobehavioral Performances are Misleading Ann. Thorac. Surg., February 1, 2007; 83(2): 370 - 373. [Full Text] [PDF] |
||||
![]() |
I. Szalma, A. Kiss, L. Kardos, G. Horvath, E. Nyitrai, Z. Tordai, and L. Csiba Piracetam prevents cognitive decline in coronary artery bypass: a randomized trial versus placebo. Ann. Thorac. Surg., October 1, 2006; 82(4): 1430 - 1435. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. A. Selnes, L. Pham, S. Zeger, and G. M. McKhann Defining Cognitive Change After CABG: Decline Versus Normal Variability Ann. Thorac. Surg., August 1, 2006; 82(2): 388 - 390. [Full Text] [PDF] |
||||
![]() |
G. M. McKhann, M. A. Grega, L. M. Borowicz Jr, M. M. Bailey, S.J.E. Barry, S. L. Zeger, W. A. Baumgartner, and O. A. Selnes Is there cognitive decline 1 year after CABG?: Comparison with surgical and nonsurgical controls Neurology, October 11, 2005; 65(7): 991 - 999. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ANN THORAC SURG | ASIAN CARDIOVASC THORAC ANN | EUR J CARDIOTHORAC SURG |
| J THORAC CARDIOVASC SURG | ICVTS | ALL CTSNet JOURNALS |