ATS
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to Personal Folders
Right arrow Download to citation manager
Right arrow Author home page(s):
William N. Anderson
Right arrow Permission Requests
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Anderson, W. N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Anderson, W. N.
Related Collections
Right arrowRelated Article

Ann Thorac Surg 2005;80:1169
© 2005 The Society of Thoracic Surgeons


The statistician's page

Statistical Techniques for Validating Logistic Regression Models

William N. Anderson, PhD *

* Address reprint requests to Dr Anderson, 21672 Montbury Dr, Lake Forest, CA 92630 (Email: wnilesanderson{at}aol.com).

The article by Karkouti and associates [1] in this issue of The Annals of Thoracic Surgery makes effective use of two analytic methods that are often overlooked in analyzing clinical data series. Both involve assessment of the quality of the logistic regression model, which is central to the conclusions drawn in the article. The methodology of this article can serve as a good example for other articles submitted to The Annals.

Logistic regression is used in the article to study the relationship of hematocrit to stroke. The article presented both the c-index and Hosmer-Lemeshow tests to assess the accuracy of the logistic model, and the fit is certainly reasonable in this case. Both this use of a logistic model and the validation method are standard.

However, logistic regression depends on an important assumption that is frequently overlooked. The logistic model assumes that the logit of the probability for stroke is a linear function of the hematocrit. [The logit of a probability p is defined as log(p/1 – p), where log represents the natural logarithm.] Figure 2 of the article presents a visual verification of this linearity assumption: the logit of the probability is reasonably close to linear, and it fits comfortably within the confidence limits produced by the logistic model. In the article the graph serves as a useful complement to the formal goodness of fit statistics.

For a different data set, the plot might have not been close to linear. Then the graph would have suggested a transformation that would have produced a more useful logistic regression model. Use of the c-index and Hosmer-Lemeshow tests alone does not produce such information, because these tests do not distinguish between nonlinearity and random noise in checking the model fit.

Such an analysis is not as easy as it may sound, because it depends on the ability to use other tools to assess the relationship. The basis for deriving the relationship is the histogram of Figure 1; the histogram could be presented with more groups, with the logit transformation of the probabilities used for the graph. However, the histogram would be quite jumpy, and the article used cubic splines to smooth the curve. The exact details of how cubic splines work to smooth curves are not important, and other smoothing methods may work as well. Whatever method is used, the important point is that the central curve in Figure 2 was not derived from the logistic regression model; instead it is being used to validate the assumptions underlying the logistic regression model.

The analysis was aided by the large size of the data set; admittedly the situation would not be so simple with even a moderate size data set. Nevertheless the graphical analysis of the logistic regression model is a tool that all analysts should consider using when the logistic regression is crucial to the analysis of a clinical data series.

The article made effective use of another often overlooked validation method, which is to perform bootstrap repetitions of the analysis. Bootstrap samples will be somewhat different than the original data set, and the technique will determine how sensitive the conclusion is to small changes in the data. Here the conclusion proved to be quite robust, increasing the reader's confidence that the detected relationship is real.

Of course bootstrapping is not a cure-all, and the article still suffers from being a single center study, albeit on a very large series. Bootstrapping also cannot replace validation by a new study. In spite of these limitations, bootstrapping is a useful and easily implemented technique that should be considered by all analysts.

The two statistical techniques are described [2–5] and are also cited in the Karkouti and associates article [1]. The usefulness of these techniques is not limited to logistic regression, and their application in related situations is described by Katz [4].


    References
 Top
 References
 

  1. Karkouti K, Djaiani G, Borger MA, et al. Low hematocrit during cardiopulmonary bypass is associated with increased risk of perioperative stroke in cardiac surgery Ann Thorac Surg 2005;80:1381-1387.[Abstract/Free Full Text]
  2. Devlin TF, Weeks BJ, Proc 11th Annual SAS Users Group International Conference Cary Spline functions for logistic regression modeling NC:SAS Institute Inc 1986:646-651.
  3. Harrell FE. SAS macros and data step programs useful in survival analysis and logistic regression. 1999 http://biostat.mc.vanderbilt.edu/twiki/bin/view/Main/SASMacros..
  4. Katz MH. Multivariable analysis. a practical guide for clinicians. Cambridge: Cambridge University Press; 1999.
  5. Efron B, Tibshirani RJ. An introduction to the bootstrap. New York: Chapman & Hall; 1993.

Related Article

Low Hematocrit During Cardiopulmonary Bypass is Associated With Increased Risk of Perioperative Stroke in Cardiac Surgery
Keyvan Karkouti, George Djaiani, Michael A. Borger, William S. Beattie, Ludwik Fedorko, Duminda Wijeysundera, Joan Ivanov, and Jacek Karski
Ann. Thorac. Surg. 2005 80: 1381-1387. [Abstract] [Full Text] [PDF]



This article has been cited by other articles:


Home page
Obstet GynecolHome page
R. T. Mikolajczyk, A. DiSilvesto, and J. Zhang
Evaluation of Logistic Regression Reporting in Current Obstetrics and Gynecology Literature
Obstet. Gynecol., February 1, 2008; 111(2): 413 - 419.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Full Text (PDF)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to Personal Folders
Right arrow Download to citation manager
Right arrow Author home page(s):
William N. Anderson
Right arrow Permission Requests
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Anderson, W. N.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Anderson, W. N.
Related Collections
Right arrowRelated Article


HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
ANN THORAC SURG ASIAN CARDIOVASC THORAC ANN EUR J CARDIOTHORAC SURG
J THORAC CARDIOVASC SURG ICVTS ALL CTSNet JOURNALS