
Volume 12, Issue 2 (February 2025), Pages: 72-79

----------------------------------------------
Original Research Paper
Classifying chronic kidney disease using selected machine learning techniques
Author(s):
Abrahem P. Anqui *
Affiliation(s):
College of Technology, Cebu Technological University, Cebu, Philippines
Full text
Full Text - PDF
* Corresponding Author.
Corresponding author's ORCID profile: https://orcid.org/0009-0000-1231-569X
Digital Object Identifier (DOI)
https://doi.org/10.21833/ijaas.2025.02.008
Abstract
Chronic kidney disease (CKD) is a serious global health problem with high mortality rates, often due to late diagnosis. Early detection and classification are essential to improve treatment outcomes and slow disease progression. This study evaluates the performance of four machine learning algorithms—linear discriminant analysis (LDA), Naïve Bayes, C4.5 decision tree, and Random Forest—in classifying CKD using a Kaggle dataset containing 1,659 instances and 52 features, covering demographic, lifestyle, and clinical data. After data pre-processing, the classification accuracies of the algorithms were assessed. LDA showed the highest accuracy at 92.8%, followed by Naïve Bayes (92.1%), C4.5 (92.0%), and Random Forest (91.9%) before hyperparameter tuning. After tuning, C4.5 achieved the highest accuracy of 92.5%, followed by Random Forest (92.2%), with Naïve Bayes remaining at 92.1%. However, even after tuning, LDA remained the most accurate, demonstrating superior performance. The key features contributing to CKD classification were serum creatinine, glomerular filtration rate (GFR), muscle cramps, protein in urine, fasting blood sugar, itching, systolic blood pressure, blood urea nitrogen (BUN), HbA1c, edema, total cholesterol, body mass index (BMI), and gender. These findings confirm that LDA outperforms other algorithms in CKD classification without the need for tuning, emphasizing the value of machine learning in improving early diagnosis and management of CKD.
© 2025 The Authors. Published by IASE.
This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Keywords
Chronic kidney disease, Machine learning algorithms, Early diagnosis, Classification accuracy, Key clinical features
Article history
Received 19 September 2024, Received in revised form 7 January 2025, Accepted 22 January 2025
Acknowledgment
No Acknowledgment.
Compliance with ethical standards
Conflict of interest: The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Citation:
Anqui AP (2025). Classifying chronic kidney disease using selected machine learning techniques. International Journal of Advanced and Applied Sciences, 12(2): 72-79
Permanent Link to this page
Figures
Fig. 1 Fig. 2
Tables
Table 1 Table 2 Table 3 Table 4 Table 5 Table 6
----------------------------------------------
References (31)
- Anqui AP (2023). Respiratory disease classification using selected data mining techniques. International Journal of Advanced and Applied Sciences Journal, 10(7): 219-223. https://doi.org/10.21833/ijaas.2023.07.024 [Google Scholar]
- Bansal M, Goyal A, and Choudhary A (2022). Stock market prediction with high accuracy using machine learning techniques. Procedia Computer Science, 215: 247-265. https://doi.org/10.1016/j.procs.2022.12.028 [Google Scholar]
- Chaithra AS, Chandana DK, Chetana SM, and Greeshma N (2023). Risk prediction of chronic kidney disease using machine learning algorithms. In: Kumar A, Gunjan VK, Hu YC, and Senatore S (Eds.), International conference on data science, machine learning and applications: 333-338. Springer Nature, Singapore, Singapore. https://doi.org/10.1007/978-981-99-2058-7_30 [Google Scholar]
- Chakraborty C, Bhattacharya M, Pal S, and Lee SS (2024). From machine learning to deep learning: Advances of the recent data-driven paradigm shift in medicine and healthcare. Current Research in Biotechnology, 7: 100164. https://doi.org/10.1016/j.crbiot.2023.100164 [Google Scholar]
- Chung J and Teo J (2022). Mental health prediction using machine learning: Taxonomy, applications, and challenges. Applied Computational Intelligence and Soft Computing, 2022: 9970363. https://doi.org/10.1155/2022/9970363 [Google Scholar]
- Cui H, Deng Y, Zhong R, Li W, Yu C, Danyushevsky LV, Belousov I, Li Z, and Wang H (2023). Determining the ore-forming processes of Dongshengmiao Zn-Pb-Cu deposit: Evidence from the linear discriminant analysis of pyrite geochemistry. Ore Geology Reviews, 163: 105782. https://doi.org/10.1016/j.oregeorev.2023.105782 [Google Scholar]
- Debal DA and Sitote TM (2022). Chronic kidney disease prediction using machine learning techniques. Journal of Big Data, 9: 109. https://doi.org/10.1186/s40537-022-00657-5 [Google Scholar]
- Delima AJP (2019). Predicting scholarship grants using data mining techniques. International Journal of Machine Learning and Computing, 9(4): 513-519. https://doi.org/10.18178/ijmlc.2019.9.4.834 [Google Scholar]
- Dennis AGP and Strafella AP (2024). The role of ai and machine learning in the diagnosis of Parkinson's disease and atypical Parkinsonisms. Parkinsonism and Related Disorders, 126: 106986. https://doi.org/10.1016/j.parkreldis.2024.106986 [Google Scholar] PMid:38724317
- Díaz-Navarro S, Díez-Hermano S, Rojo-Guerra MA, Maurandi JL, Valdiosera C, Gunther T, and Uriarte MH (2024). Sex estimation using long bones in the largest burial site of the Copper Age: Linear discriminant analysis and random forest. Journal of Archaeological Science: Reports, 58: 104730. https://doi.org/10.1016/j.jasrep.2024.104730 [Google Scholar]
- Hakim DK, Gernowo R, and Nirwansyah AW (2024). Flood prediction with time series data mining: Systematic review. Natural Hazards Research, 4(2): 194-220. https://doi.org/10.1016/j.nhres.2023.10.001 [Google Scholar]
- Hossain MM, Swarna RA, Mostafiz R, Shaha P, Pinky LY, Rahman MM, Rahman W, Hossain MS, Hossain ME, and Iqbal MS (2022). Analysis of the performance of feature optimization techniques for the diagnosis of machine learning-based chronic kidney disease. Machine Learning with Applications, 9: 100330. https://doi.org/10.1016/j.mlwa.2022.100330 [Google Scholar]
- Jagdale KR, Shelke CJ, Achary R, Wankhede DS, and Bhandare TV (2022). Artificial intelligence and its subsets: Machine learning and its algorithms, deep learning, and their future trends. Journal of Emerging Technologies and Innovative Research, 9(5): 112-117. [Google Scholar]
- Kaplan A and Haenlein M (2019). Siri, Siri, in my hand: Who's the fairest in the land? On the interpretations, illustrations, and implications of artificial intelligence. Business Horizons, 62(1): 15-25. https://doi.org/10.1016/j.bushor.2018.08.004 [Google Scholar]
- Li P, Xiong F, Huang X, and Wen X (2024). Construction and optimization of vending machine decision support system based on improved C4.5 decision tree. Heliyon, 10(3): e25024. https://doi.org/10.1016/j.heliyon.2024.e25024 [Google Scholar] PMid:38318033 PMCid:PMC10838796
- Mantelakis A, Assael Y, Sorooshian P, and Khajuria A (2021). Machine learning demonstrates high accuracy for disease diagnosis and prognosis in plastic surgery. Plastic and Reconstructive Surgery–Global Open, 9(6): e3638. https://doi.org/10.1097/GOX.0000000000003638 [Google Scholar] PMid:34235035 PMCid:PMC8225366
- Meher BK, Singh M, Birau R, and Anand A (2024). Forecasting stock prices of fintech companies of India using random forest with high-frequency data. Journal of Open Innovation: Technology, Market, and Complexity, 10(1): 100180. https://doi.org/10.1016/j.joitmc.2023.100180 [Google Scholar]
- Owens E, Tan KS, Ellis R, Del Vecchio S, Humphries T, Lennan E, Vesey D, Healy H, Hoy W, and Gobe G (2020). Development of a biomarker panel to distinguish risk of progressive chronic kidney disease. Biomedicines, 8(12): 606. https://doi.org/10.3390/biomedicines8120606 [Google Scholar] PMid:33327377 PMCid:PMC7764886
- Pareek A, Karlsson J, and Martin RK (2024). Machine learning/artificial intelligence in sports medicine: State of the art and future directions. Journal of ISAKOS, 9(4): 635-644. https://doi.org/10.1016/j.jisako.2024.01.013 [Google Scholar] PMid:38336099
- Rane N, Sunny J, Kanade R, and Devi S (2020). Breast cancer classification and prediction using machine learning. International Journal of Engineering Research and Technology, 9(2): 576-580. https://doi.org/10.17577/IJERTV9IS020280 [Google Scholar]
- Rani P, Lamba R, Sachdeva RK, Kumar K, and Iwendi C (2024). A machine learning model for Alzheimer's disease prediction. IET Cyber‐Physical Systems: Theory and Applications, 9(2): 125-134. https://doi.org/10.1049/cps2.12090 [Google Scholar]
- Ricciardi C, Valente AS, Edmund K, Cantoni V, Green R, Fiorillo A, Picone I, Santini S, and Cesarelli M (2020). Linear discriminant analysis and principal component analysis to predict coronary artery disease. Health Informatics Journal, 26(3): 2181-2192. https://doi.org/10.1177/1460458219899210 [Google Scholar] PMid:31969043
- Sano AVD, Stefanus AA, Madyatmadja ED, Nindito H, Purnomo A, and Sianipar CP (2023). Proposing a visualized comparative review analysis model on tourism domain using Naïve Bayes classifier. Procedia Computer Science, 227: 482-489. https://doi.org/10.1016/j.procs.2023.10.549 [Google Scholar]
- Senturk ZK (2020). Early diagnosis of Parkinson's disease using machine learning algorithms. Medical Hypotheses, 138: 109603. https://doi.org/10.1016/j.mehy.2020.109603 [Google Scholar] PMid:32028195
- Shaikh FJ and Rao DS (2022). Prediction of cancer disease using machine learning approach. Materials Today: Proceedings, 50: 40-47. https://doi.org/10.1016/j.matpr.2021.03.625 [Google Scholar]
- Sun W (2022). Data mining in the big data era. Advances in Social Science, Education and Humanities Research, 664: 2107-2111. https://doi.org/10.2991/assehr.k.220504.381 [Google Scholar] PMid:36572217
- Wang X, Zhou C, and Xu X (2019). Application of C4.5 decision tree for scholarship evaluations. Procedia Computer Science, 151: 179-184. https://doi.org/10.1016/j.procs.2019.04.027 [Google Scholar]
- Wu Y, Li L, Xin B, Hu Q, Dong X, and Li Z (2023). Application of machine learning in personalized medicine. Intelligent Pharmacy, 1(3): 152-156. https://doi.org/10.1016/j.ipha.2023.06.004 [Google Scholar]
- Xue J, Alinejad-Rokny H, and Liang K (2024). Navigating micro-and nano-motors/swimmers with machine learning: Challenges and future directions. ChemPhysMater, 3(3): 273-283. https://doi.org/10.1016/j.chphma.2024.06.001 [Google Scholar]
- Yu YP, Liu S, Geller D, and Luo JH (2024). Serum fusion transcripts to assess the risk of hepatocellular carcinoma and the impact of cancer treatment through machine learning. The American Journal of Pathology, 194(7): 1262-1271. https://doi.org/10.1016/j.ajpath.2024.02.017 [Google Scholar] PMid:38537933 PMCid:PMC11220925
- Zampogna B, Torre G, Zampoli A, Parisi F, Ferrini A, Shanmugasundaram S, Franceschetti E, and Papalia R (2024). Can machine learning predict the accuracy of preoperative planning for total hip arthroplasty, basing on patient-related factors? An explorative investigation on Supervised machine learning classification models. Journal of Clinical Orthopaedics and Trauma, 53: 102470. https://doi.org/10.1016/j.jcot.2024.102470 [Google Scholar] PMid:39045495
|