Article,

Internal validation of predictive models: efficiency of some procedures for logistic regression analysis.

E. Steyerberg, F. Harrell, G. Borsboom, M. Eijkemans, Y. Vergouwe, and J. Habbema.
Journal of clinical epidemiology, 54 (8): 774-81 (August 2001)6555<m:linebreak></m:linebreak>LR: 20061115; JID: 8801383; ppublish;<m:linebreak></m:linebreak>Proves diagnòstiques; Models predictius; Regressió logística.

Abstract

The performance of a predictive model is overestimated when simply determined on the sample of subjects that was used to construct the model. Several internal validation methods are available that aim to provide a more accurate estimate of model performance in new subjects. We evaluated several variants of split-sample, cross-validation and bootstrapping methods with a logistic regression model that included eight predictors for 30-day mortality after an acute myocardial infarction. Random samples with a size between n = 572 and n = 9165 were drawn from a large data set (GUSTO-I; n = 40,830; 2851 deaths) to reflect modeling in data sets with between 5 and 80 events per variable. Independent performance was determined on the remaining subjects. Performance measures included discriminative ability, calibration and overall accuracy. We found that split-sample analyses gave overly pessimistic estimates of performance, with large variability. Cross-validation on 10% of the sample had low bias and low variability, but was not suitable for all performance measures. Internal validity could best be estimated with bootstrapping, which provided stable estimates with low bias. We conclude that split-sample validation is inefficient, and recommend bootstrapping for estimation of internal validity of a predictive logistic regression model.

BibTeX key: Steyerberg2001
entry type: article
year: 2001
month: 8
journal: Journal of clinical epidemiology
number: 8
pages: 774-81
volume: 54
city: Center for Clinical Decision Sciences, Ee 2091, Department of Public Health, Erasmus University, P.O. Box 1738, 3000 DR, Rotterdam, The Netherlands. steyerberg@mgz.fgg.eur.nl
isbn: 0895-4356; 0895-4356
pmid: 11470385
issn: 0895-4356
url: http://www.ncbi.nlm.nih.gov/pubmed/11470385
note: 6555<m:linebreak></m:linebreak>LR: 20061115; JID: 8801383; ppublish;<m:linebreak></m:linebreak>Proves diagnòstiques; Models predictius; Regressió logística

Users

Comments and Reviewsshow / hide

Please log in to take part in the discussion (add own reviews or comments).

Cite this publication

@article{Steyerberg2001, abstract = {The performance of a predictive model is overestimated when simply determined on the sample of subjects that was used to construct the model. Several internal validation methods are available that aim to provide a more accurate estimate of model performance in new subjects. We evaluated several variants of split-sample, cross-validation and bootstrapping methods with a logistic regression model that included eight predictors for 30-day mortality after an acute myocardial infarction. Random samples with a size between n = 572 and n = 9165 were drawn from a large data set (GUSTO-I; n = 40,830; 2851 deaths) to reflect modeling in data sets with between 5 and 80 events per variable. Independent performance was determined on the remaining subjects. Performance measures included discriminative ability, calibration and overall accuracy. We found that split-sample analyses gave overly pessimistic estimates of performance, with large variability. Cross-validation on 10% of the sample had low bias and low variability, but was not suitable for all performance measures. Internal validity could best be estimated with bootstrapping, which provided stable estimates with low bias. We conclude that split-sample validation is inefficient, and recommend bootstrapping for estimation of internal validity of a predictive logistic regression model.}, added-at = {2023-02-03T11:44:35.000+0100}, author = {Steyerberg, E W and Harrell, F E and Borsboom, G J and Eijkemans, M J and Vergouwe, Y and Habbema, J D}, biburl = {https://www.bibsonomy.org/bibtex/2ad9b79702103326e666af506201e97da/jepcastel}, city = {Center for Clinical Decision Sciences, Ee 2091, Department of Public Health, Erasmus University, P.O. Box 1738, 3000 DR, Rotterdam, The Netherlands. steyerberg@mgz.fgg.eur.nl}, interhash = {a6d4b6a0c722653bf3428d3f04180f72}, intrahash = {ad9b79702103326e666af506201e97da}, isbn = {0895-4356; 0895-4356}, issn = {0895-4356}, journal = {Journal of clinical epidemiology}, keywords = {Aged Bias(Epidemiology) Female Humans LogisticModels Male MyocardialInfarction MyocardialInfarction:mortality PredictiveValueofTests ReproducibilityofResults}, month = {8}, note = {6555<m:linebreak></m:linebreak>LR: 20061115; JID: 8801383; ppublish;<m:linebreak></m:linebreak>Proves diagnòstiques; Models predictius; Regressió logística}, number = 8, pages = {774-81}, pmid = {11470385}, timestamp = {2023-02-03T11:44:35.000+0100}, title = {Internal validation of predictive models: efficiency of some procedures for logistic regression analysis.}, url = {http://www.ncbi.nlm.nih.gov/pubmed/11470385}, volume = 54, year = 2001 }

BibSonomy

Internal validation of predictive models: efficiency of some procedures for logistic regression analysis.

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on