Natural Sciences
Life Sciences
Scientific Computing
Back    
Category:
Life Science

Lecturer:
Christine Wallisch, Institute of Clinical Biometrics, Medizinische Universität Wien, Austria

Place:
DKFZ, ATV Seminar Room, Im Neuenheimer Feld 242

Host:
Institut für Medizinische Biometrie und Informatik

Description:
Data-driven variable selection methods, e.g., backward elimination or Lasso, are commonly used to identify relevant explanatory variables and to reduce the number or explanatory variables in multivariable statistical models. Applying these algorithms may lead to false inclusion (resulting in increased variance) or exclusion (resulting in a possible bias) of variables. Hence, stability investigations are needed to assess the robustness of a model with regard to slight modifications in the data. Stability investigations can be carried out by using resampling methods to assess variable selection frequencies, model selection frequencies, and distributions of selected regression coefficients. Recently, we proposed to compute two additional measures (BiomJ, 2018): • the relative conditional bias RCB, which quantifies the bias that is induced by variable selection relative to the coefficient in a global model, and • the root mean squared difference ratio RMSDR, which expresses the additional uncertainty induced by variable selection by comparing the variance of selected regression coefficients with the variance of the regression coefficients in a global model. We conducted a comprehensive simulation study to investigate whether bootstrap or subsampling is better suited to estimate these quantities. Bootstrap-based estimation was found to be suboptimal for variable and model selection frequencies, whereas subsampling-based estimation was inappropriate for the RMSDR. Thus, both resampling approaches have to be employed to calculate reliable stability measures. Stability investigations supply important information for data analysts conducting data-driven variable selection. It should become a routine step whenever a statistical model is developed.

Event data:
Import event data into Outlook Calendar