International Journal of Advanced and Applied Sciences

Int. j. adv. appl. sci.

EISSN: 2313-3724

Print ISSN: 2313-626X

Volume 3, Issue 12  (December 2016), Pages:  26-31


Title: Variable selection with genetic algorithm and multivariate adaptive regression splines in the presence of multicollinearity

Author(s):  Betul Kan Kilinc 1, *, Baris Asikgil 2, Aydin Erar 2, Berna Yazici 1

Affiliation(s):

1Department of Statistics, Science Faculty, Anadolu University, Eskisehir, Turkey
2Department of Statistics, Faculty of Science and Letters, Mimar Sinan Fine Arts University, Istanbul, Turkey

https://doi.org/10.21833/ijaas.2016.12.004

Full Text - PDF          XML

Abstract:

In this paper, it is aimed to determine the true regressors explaining the dependent variable in multiple linear regression models and also to find the best model by using two different approaches in the presence of low, medium and high multicollinearity. These approaches compared in this study are genetic algorithm and multivariate adaptive regression splines. A comprehensive Monte Carlo experiment is performed in order to examine the performance of these approaches. This study exposes that nonparametric methods can be preferred for variable selection in order to obtain the best model when there is a multicollinearity problem in the small, medium or large data sets. 

© 2016 The Authors. Published by IASE.

This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Keywords: Variable selection, Multicollinearity, Genetic algorithm, Multivariate adaptive regression splines

Article History: Received 2 October 2016, Received in revised form 20 November 2016, Accepted 21 November 2016

Digital Object Identifier: https://doi.org/10.21833/ijaas.2016.12.004

Citation:

Kilinc BK, Asikgil B, Erar A, and Yazici  B (2016). Variable selection with genetic algorithm and multivariate adaptive regression splines in the presence of multicollinearity. International Journal of Advanced and Applied Sciences, 3(12): 26-31

http://www.science-gate.com/IJAAS/V3I12/Kilinc.html


References:

Belsley DA (1991). Conditioning diagnostics collinearity and weak data in regression. John Wiley and Sons, New York, USA.
De Boor C (1978). A Practical guide to splines. Springer-Verlag, New York, USA.
https://doi.org/10.1007/978-1-4612-6333-3
Fan J and Gijbels I (1995). Data-driven bandwidth selection in local polynomial fitting: variable bandwidth and spatial adaptation. Journal of the Royal Statistical Society Series B, 57(2): 371-394.
Friedman JH (1991). Multivariate adaptive regression splines. The Annals of Statistics, 19(1): 1-67.
https://doi.org/10.1214/aos/1176347963
Goldberg DE (1989). Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Reading, Massachusetts, USA.
Gorman JW and Toman RJ (1966). Selection of variables for fitting equations to data. Technometrics, 8(1): 27-51.
https://doi.org/10.1080/00401706.1966.10490322
Hastie T, Tibshirani R and Friedman J (2001). The Elements of Statistical Learning Data Mining, Inference and Prediction. Springer, New York.
Holland J (1975). Adaptation in natural and artificial systems. University of Michigan Press, Ann Arbor, USA.
Lindstrom MJ (1999). Penalized estimation of free-knot splines. Journal of Computational and Graphical Statistics, 8(2): 333-352.
Manela M, Thornhill N and Campbell J (1993). Fitting spline functions to noisy data using a genetic algorithm. In Proceedings of the 5th International Conference on Genetic Algorithms, Morgan Kaufmann Publishers Inc., San Francisco, USA: 549-556

Milborrow S (2011). Derived from mda: MARS by Trevor Hastie and Rob Tibshirani. Earth: Multivariate Adaptive Regression Spline Models. R package version 3.2-0. Available online at:

http://CRAN.R-project.org/package=earth

Mitchell M (1996). An introduction to genetic algorithms. MIT Press, Cambridge, UK.
Montgomery DC, Peck EA and Vining GG (2012). Introduction to linear regression analysis. John Wiley and Sons, New Jersey, USA.
Pan Z, Chen Y, Kang L and Zhang Y (1995). Parameter estimation by genetic algorithms for nonlinear regression. In the Proceedings of International Conference on Optimization Technique and Applications '95, 2. World Scientific, Singapore: 946–953.
Pittman J (2002). Adaptive splines and genetic algorithms. Journal of Computational and Graphical Statistics, 11(3): 615-638.
https://doi.org/10.1198/106186002448
RDC Team (R Development Core Team) (2006). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0. Available online at: http://www.R-project.org
Rogers D (1991). G/SPLINES: A hybrid of Friedman's multivariate adaptive regression splines (MARS) algorithm with Holland's genetic algorithm. In the Proceedings of the Fourth International Conference on Genetic Algorithms. San Diego, USA.
Schwetlick H and Schütze T (1995). Least squares approximation by splines with free knots. BIT Numerical Mathematics, 35(3):361-384.
https://doi.org/10.1007/BF01732610
Sephton P (2001). Forecasting recessions: Can we do better on MARS?. Review Federal Reserve Bank of St. Louis, 83(2): 39-49.
Wahba G and Craven P (1978). Smoothing noisy data with spline functions. Numerische Mathematik, 31(4): 371-403.
Yao L and Sethares W A (1994). Nonlinear parameter estimation via the genetic algorithm. IEEE Transactions on Signal Processing, 42(4): 927-935.
https://doi.org/10.1109/78.285655