Optimasi Prediksi Hipertensi Menggunakan Logistic Regression Berbasis Borderline SMOTE dan Penjelasan Model dengan SHAP
DOI:
https://doi.org/10.61132/prosemnasproit.v2i2.139Keywords:
Borderline-SMOTE, Explainable Artificial Intelligence, Hipertensi, Logistic Regression, SHAPAbstract
Hypertension is a major global health risk that requires accurate early detection, yet conventional methods struggle with complex and imbalanced health datasets. This study aims to optimize hypertension prediction using a Logistic Regression model integrated with Borderline-SMOTE to enhance recall and provide model transparency through SHAP (Shapley Additive Explanations). The method utilizes the BRFSS dataset, applying Borderline-SMOTE to address class imbalance at the decision boundary and XAI techniques for global and local interpretation. The findings show that the model achieved an accuracy of 0.719, an AUC of 0.800, and a significantly improved recall of 0.756. SHAP analysis identified age, high cholesterol, and BMI as the most influential risk factors, while waterfall plots successfully clarified individual risk extremes, ranging from 1.72% to 99.43% probability. These results imply that the proposed approach provides a sensitive and transparent screening tool for public health practitioners, effectively balancing statistical efficiency with clinical accountability.
References
Aiosa, G. V., Palesi, M., & Sapuppo, F. (2023). EXplainable AI for Decision Support to Obesity Comorbidities Diagnosis. IEEE Access, 11, 107767–107782. https://doi.org/10.1109/access.2023.3320057
AlKaabi, L. A., Ahmed, L. S., Al Attiyah, M. F., & Abdel-Rahman, M. E. (2020). Predicting hypertension using machine learning: Findings from Qatar Biobank Study. PLOS ONE, 15(10), e0240370. https://doi.org/10.1371/journal.pone.0240370
Badriyah, Chamidy, T., & Suhartono. (2025). Application of SMOTE in Sentiment Analysis of MyXL User Reviews on Google Play Store. JISKA (Jurnal Informatika Sunan Kalijaga), 10(1), 74–86. https://doi.org/10.14421/jiska.2025.10.1.74-86
Bifarin, O. O. (2023). Interpretable machine learning with tree-based shapley additive explanations: Application to metabolomics datasets for binary classification. PLOS ONE, 18(5), e0284315. https://doi.org/10.1371/journal.pone.0284315
Bisong, E., Jibril, N., Premnath, P., Buligwa, E., Oboh, G., & Chukwuma, A. (2024). Predicting high blood pressure using machine learning models in low- and middle-income countries. BMC Medical Informatics and Decision Making, 24(1), 234. https://doi.org/10.1186/s12911-024-02634-9
Bradshaw, T. J., Huemann, Z., Hu, J., & Rahmim, A. (2023). A Guide to Cross-Validation for Artificial Intelligence in Medical Imaging. Radiology: Artificial Intelligence, 5(4), e220232. https://doi.org/10.1148/ryai.220232
Chen, N., Fan, F., Geng, J., Yang, Y., Gao, Y., Jin, H., Chu, Q., Yu, D., Wang, Z., & Shi, J. (2022). Evaluating the risk of hypertension in residents in primary care in Shanghai, China with machine learning algorithms. Frontiers in Public Health, 10, 984621. https://doi.org/10.3389/fpubh.2022.984621
Di Franco, G., & Santurro, M. (2021). Machine learning, artificial neural networks and social research. Quality & Quantity, 55(3), 1007–1025. https://doi.org/10.1007/s11135-020-01037-y
Han, J., & Kamber, M. (2012). Data mining: Concepts and techniques (3rd ed). Elsevier.
Hashimoto-Roth, E., Surendra, A., Lavallée-Adam, M., Bennett, S. A. L., & Čuperlović-Culf, M. (2022). METAbolomics data Balancing with Over-sampling Algorithms (META-BOA): An online resource for addressing class imbalance. Bioinformatics, 38(23), 5326–5327. https://doi.org/10.1093/bioinformatics/btac649
Jr, D. W. H., Lemeshow, S., & Sturdivant, R. X. (2013). Applied Logistic Regression. John Wiley & Sons.
Khan, H., Javaid, N., Bashir, T., Akbar, M., Alrajeh, N., & Aslam, S. (2024). Heart Disease Prediction Using Novel Ensemble and Blending Based Cardiovascular Disease Detection Networks: EnsCVDD-Net and BlCVDD-Net. IEEE Access, 12, 109230–109254. https://doi.org/10.1109/ACCESS.2024.3421241
Kumari, J., Kumar, E., & Kumar, D. (2023). A Structured Analysis to study the Role of Machine Learning and Deep Learning in The Healthcare Sector with Big Data Analytics. Archives of Computational Methods in Engineering, 30(6), 3673–3701. https://doi.org/10.1007/s11831-023-09915-y
Layton, A. T. (2024). AI, Machine Learning, and ChatGPT in Hypertension. Hypertension, 81(4), 709–716. https://doi.org/10.1161/HYPERTENSIONAHA.124.19468
Liang, B., Tong, C., Nong, J., & Zhang, Y. (2024). Histological Subtype Classification of Non-Small Cell Lung Cancer with Radiomics and 3D Convolutional Neural Networks. Journal of Imaging Informatics in Medicine, 37(6), 2895–2909. https://doi.org/10.1007/s10278-024-01152-4
Martinez-Ríos, E., Montesinos, L., Alfaro-Ponce, M., & Pecchia, L. (2021). A review of machine learning in hypertension detection and blood pressure estimation based on clinical and physiological data. Biomedical Signal Processing and Control, 68, 102813. https://doi.org/10.1016/j.bspc.2021.102813
Mashuri, Y. A., Ng, N., & Santosa, A. (2022). Socioeconomic disparities in the burden of hypertension among Indonesian adults—A multilevel analysis. Global Health Action, 15(1). https://doi.org/10.1080/16549716.2022.2129131
Montagna, S., Pengo, M. F., Ferretti, S., Borghi, C., Ferri, C., Grassi, G., Muiesan, M. L., & Parati, G. (2022). Machine Learning in Hypertension Detection: A Study on World Hypertension Day Data. Journal of Medical Systems, 47(1). https://doi.org/10.1007/s10916-022-01900-5
Ridley, M. (2022). Explainable Artificial Intelligence (XAI): Adoption and Advocacy. Information Technology and Libraries, 41(2). https://doi.org/10.6017/ital.v41i2.14683
Salih, A., Raisi-Estabragh, Z., Galazzo, I. B., Radeva, P., Petersen, S. E., Menegaz, G., & Lekadir, K. (2023). A Perspective on Explainable Artificial Intelligence Methods: SHAP and LIME. https://doi.org/10.48550/ARXIV.2305.02012
Sellén Crombet, Joaquín. (2008). Hipertensión arterial: Diagnóstico, tratamiento y control. Editorial Universitaria.
Si, F., Liu, Q., & Yu, J. (2025). A prediction study on the occurrence risk of heart disease in older hypertensive patients based on machine learning. BMC Geriatrics, 25(1), 27. https://doi.org/10.1186/s12877-025-05679-1
Sovia, N. A., Wardhani, N. W. S., & Sumarminingsih, E. (2024). Hybrid CNN-SVM with Borderline SMOTE for Imbalance Class Cabbage Plants. Inferensi, 7(3), 199. https://doi.org/10.12962/j27213862.v7i3.20514
Touyz, R. M. (2022). Hypertension 2022 Update: Focusing on the Future. Hypertension, 79(8), 1559–1562. https://doi.org/10.1161/HYPERTENSIONAHA.122.19564
Trevisan, P., Pasquato, M., Carenini, G., Mekhaël, N., Braga, V. F., Bono, G., & Abbas, M. (2023). Sparse Logistic Regression for RR Lyrae versus Binaries Classification. The Astrophysical Journal, 950(2), 103. https://doi.org/10.3847/1538-4357/accf8f
Vera, J. F. (2022). Distance‐based logistic model for cross‐classified categorical data. British Journal of Mathematical and Statistical Psychology, 75(3), 466–492. https://doi.org/10.1111/bmsp.12264
Weir, M. R. (2010). Evidence-based management of hypertension (First edition). tfm Pub.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Prosiding Seminar Nasional Ilmu Teknik

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.





