Publications
Back to browse results
 
Simultaneous selection of variables and smoothing parameters in structured additive regression models
Authors: Christiane Belitza, and Stefan Lang
Source: Computational Statistics & Data Analysis, Volume 53, Issue 1, 15 September 2008, Pages 61-81
Topic(s): Data models
Country: More than one region
  Multiple Regions
Published: SEP 2008
Abstract: In recent years, considerable research has been devoted to developing complex regression models that can deal simultaneously with nonlinear covariate effects and time trends, unit- or cluster specific heterogeneity, spatial heterogeneity and complex interactions between covariates of different types. Much less effort, however, has been devoted to model and variable selection. The paper develops a methodology for the simultaneous selection of variables and the degree of smoothness in regression models with a structured additive predictor. These models are quite general, containing additive (mixed) models, geoadditive models and varying coefficient models as special cases. This approach allows one to decide whether a particular covariate enters the model linearly or nonlinearly or is removed from the model. Moreover, it is possible to decide whether a spatial or cluster specific effect should be incorporated into the model to cope with spatial or cluster specific heterogeneity. Particular emphasis is also placed on selecting complex interactions between covariates and effects of different types. A new penalty for two-dimensional smoothing is proposed, that allows for ANOVA-type decompositions into main effects and an interaction effect without explicitly specifying the main effects. The penalty is an additive combination of other penalties. Fast algorithms and software are developed that allow one to even handle situations with many covariate effects and observations. The algorithms are related to backfitting and Markov chain Monte Carlo techniques, which divide the problem in a divide and conquer strategy into smaller pieces. Confidence intervals taking model uncertainty into account are based on the bootstrap in combination with MCMC techniques