Volume 8, Issue 9 (September 2021), Pages: 29-38
----------------------------------------------
Original Research Paper
Title: A framework for predicting employee health risks using ensemble model
Author(s): Nicholas Khin-Whai Chan 1, Angela Siew-Hoong Lee 1, *, Zuraini Zainol 2
Affiliation(s):
1Department of Computing and Information Systems, Sunway University, Sunway, Malaysia
2Department of Computer Science, Universiti Pertahanan Nasional Malaysia, Kuala Lumpur, Malaysia
Full Text - PDF XML
* Corresponding Author.
Corresponding author's ORCID profile: https://orcid.org/0000-0003-3388-2372
Digital Object Identifier:
https://doi.org/10.21833/ijaas.2021.09.004
Abstract:
Through the phenomenon of data, big data and data analytics have provided an opportunity to collect, store, process, analyze and visualize an immense amount of information. Healthcare is recognized as one of the most information-intensive sectors. An urge to explore analytics has been sparked by the rapid growth of data within the healthcare sector. Most employers in Malaysia provide medical benefits that are included in the medical insurance plan for their employees. Data collected such as the history of medical claims are stored with the HR (Human Resource) which contributes to the potential of analyzing and recognizing trends within medical claims to better understand the use and overall health of the employee population. Patients with higher risk will generally convert into patients with high costs. Hence, early intervention of these patients will allow employers to potentially minimize costs and plan preventative steps. In predictive analysis, Decision Trees and Regression are typical techniques applied. The proposed framework combines an ensemble technique known as Stacking. As opposed to a single predictive model, an ensemble predictive model would yield better performance and accuracy. The objective of this paper is therefore to review current practices and past research within the healthcare sector while suggesting a practical framework for classification ensemble modeling. Preliminary findings indicated that an ensemble model can produce higher predictive accuracy and performance than a single model.
© 2021 The Authors. Published by IASE.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Keywords: Data analytics, Predictive analysis, Ensemble modeling, Stacking, Framework
Article History: Received 16 December 2020, Received in revised form 20 April 2021, Accepted 10 June 2021
Acknowledgment
No Acknowledgment.
Compliance with ethical standards
Conflict of interest: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Citation:
Chan NKW, Lee ASH, and Zainol Z (2021). A framework for predicting employee health risks using ensemble model. International Journal of Advanced and Applied Sciences, 8(9): 29-38
Permanent Link to this page
Figures
Fig. 1 Fig. 2 Fig. 3 Fig. 4 Fig. 5 Fig. 6
Tables
No Table
----------------------------------------------
References (29)
- Abdunabi TA (2016). A framework for ensemble predictive modeling. Ph.D. Dissertation, University of Waterloo, Waterloo, Canada. [Google Scholar]
- Agarwal R (2019). The 5 feature selection algorithms every data scientist should know. Available online at: https://towardsdatascience.com/the-5-feature-selection-algorithms-every-data-scientist-need-to-know-3a6b566efd2
- Alharthi H (2018). Healthcare predictive analytics: An overview with a focus on Saudi Arabia. Journal of Infection and Public Health, 11(6): 749-756. https://doi.org/10.1016/j.jiph.2018.02.005 [Google Scholar] PMid:29526444
- Alonso SG, de la Torre Diez I, Rodrigues JJ, Hamrioui S, and Lopez-Coronado M (2017). A systematic review of techniques and sources of big data in the healthcare sector. Journal of Medical Systems, 41(11): 1-9. https://doi.org/10.1007/s10916-017-0832-2 [Google Scholar] PMid:29032458
- Annamalai N, Kamaruddin S, Abdul Azid I, and Yeoh TS (2013). Importance of problem statement in solving industry problems. Applied Mechanics and Materials, 421: 857-863. https://doi.org/10.4028/www.scientific.net/AMM.421.857 [Google Scholar]
- Bates DW, Saria S, Ohno-Machado L, Shah A, and Escobar G (2014). Big data in health care: Using analytics to identify and manage high-risk and high-cost patients. Health Affairs, 33(7): 1123-1131. https://doi.org/10.1377/hlthaff.2014.0041 [Google Scholar] PMid:25006137
- Bruno G, Cerquitelli T, Chiusano S, and Xiao X (2014). A clustering-based approach to analyze examinations for diabetic patients. In the IEEE International Conference on Healthcare Informatics, IEEE, Verona, Italy: 45-50. https://doi.org/10.1109/ICHI.2014.14 [Google Scholar] PMCid:PMC6353491
- Chandrasekar P, Qian K, Shahriar H, and Bhattacharya P (2017). Improving the prediction accuracy of decision tree mining with data preprocessing. In the IEEE 41st Annual Computer Software and Applications Conference, IEEE, Turin, Italy, 2: 481-484. https://doi.org/10.1109/COMPSAC.2017.146 [Google Scholar]
- Eapen AG (2004). Application of data mining in medical applications. M.Sc. Thesis, University of Waterloo, Waterloo, Canada. [Google Scholar]
- Gore A (2012). The digital earth: understanding our planet in the 21st century. The Australian Surveyor, 43(2): 89-91. https://doi.org/10.1080/00050348.1998.10558728 [Google Scholar]
- Hu H, Li JY, Wang H, Daggard G, and Wang LZ (2008). Robustness analysis of diversified ensemble decision tree algorithms for microarray data classification. In the International Conference on Machine Learning and Cybernetics, IEEE, Kunming, China, 1: 115-120. https://doi.org/10.1109/ICMLC.2008.4620389 [Google Scholar]
- Jain R (2015). Predictive modeling for chronic conditions. M.Sc. Thesis, Florida Atlantic University, Boca Raton, USA. [Google Scholar]
- Jović A, Brkić K, and Bogunović N (2015). A review of feature selection methods with applications. In the 38th International Convention on Information and Communication Technology, Electronics and Microelectronics, IEEE, Opatija, Croatia: 1200-1205. https://doi.org/10.1109/MIPRO.2015.7160458 [Google Scholar]
- Kincade K (1998). Data mining: Digging for healthcare gold. Insurance and Technology, 23(2): 2-7. [Google Scholar]
- Koh HC and Tan G (2011). Data mining applications in healthcare. Journal of Healthcare Information Management, 19(2): 65-72. [Google Scholar]
- Koller D, Schön G, Schäfer I, Glaeske G, van den Bussche H, and Hansen H (2014). Multimorbidity and long-term care dependency-A five-year follow-up. BioMed Central Geriatrics, 14(1): 1-9. https://doi.org/10.1186/1471-2318-14-70 [Google Scholar] PMid:24884813 PMCid:PMC4046081
- Lin YK, Chen H, Brown R, Li SH, and Yang HJ (2014). Healthcare analytics and clinical intelligence: A risk prediction framework for chronic care. https://doi.org/10.2139/ssrn.2444025 [Google Scholar]
- Menahem E, Rokach L, and Elovici Y (2009). Troika–An improved stacking schema for classification tasks. Information Sciences, 179(24): 4097-4122. https://doi.org/10.1016/j.ins.2009.08.025 [Google Scholar]
- Moturu ST, Johnson WG, and Liu H (2007). Predicting future high-cost patients: A real-world risk modeling application. In the IEEE International Conference on Bioinformatics and Biomedicine, IEEE, Fremont, USA: 202-208. https://doi.org/10.1109/BIBM.2007.54 [Google Scholar]
- Raghupathi W and Raghupathi V (2014). Big data analytics in healthcare: Promise and potential. Health Information Science and Systems, 2(1): 1-10. https://doi.org/10.1186/2047-2501-2-3 [Google Scholar] PMid:25825667 PMCid:PMC4341817
- Rahm E (2016). Big data analytics. IT-Information Technology, 58(4): 155-156. https://doi.org/10.1515/itit-2016-0024 [Google Scholar]
- Ramzai J (2019). Simple guide for ensemble learning methods. Available online at: https://towardsdatascience.com/simple-guide-for-ensemble-learning-methods-d87cc68705a2
- Raul A, Patil A, Raheja P, and Sawant R (2016). Knowledge discovery, analysis and prediction in healthcare using data mining and analytics. In the 2nd International Conference on Next Generation Computing Technologies, IEEE, Dehradun, India: 475-478. https://doi.org/10.1109/NGCT.2016.7877462 [Google Scholar]
- Solutions V (2016). Improving predictions with ensemble model. Available online at: https://www.datasciencecentral.com/profiles/blogs/improving-predictions-with-ensemble-model
- Tekieh MH (2012). Analysis of healthcare coverage using data mining techniques. Ph.D. Dissertation, University of Ottawa, Ottawa, Canada. [Google Scholar]
- Tuysuzoglu G, Birant D, and Pala A (2017). Ensemble methods in environmental data mining. IntechOpen, Rijeka, Croatia. https://doi.org/10.5772/intechopen.74393 [Google Scholar]
- Wang Y, Kung L, and Byrd TA (2018). Big data analytics: Understanding its capabilities and potential benefits for healthcare organizations. Technological Forecasting and Social Change, 126: 3-13. https://doi.org/10.1016/j.techfore.2015.12.019 [Google Scholar]
- Wolpert DH (1992). Stacked generalization. Neural Networks, 5(2): 241-259. https://doi.org/10.1016/S0893-6080(05)80023-1 [Google Scholar]
- Yuvaraj N and SriPreethaa KR (2019). Diabetes prediction in healthcare systems using machine learning algorithms on Hadoop cluster. Cluster Computing, 22(1): 1-9. https://doi.org/10.1007/s10586-017-1532-x [Google Scholar]
|