Evaluasi Kinerja Machine Learning pada Klasifikasi Penyakit Jantung Menggunakan Teknik Penyeimbangan Data
DOI:
https://doi.org/10.61132/prosemnasproit.v2i2.59Keywords:
Heart Disease, Machine Learning, Oversampling, Random Oversampling, SMOTEAbstract
AImbalanced data remains a significant issue in heart disease classification using machine learning, as it tends to cause models to overestimate the majority class while ignoring minority classes with high clinical value. This can lead to a decrease in accuracy and the model's ability to accurately detect disease cases. Therefore, this study aims to assess the effectiveness of oversampling techniques, namely Random Oversampling and Synthetic Minority Oversampling Technique (SMOTE), in improving the performance of the K-Nearest Neighbors (KNN), Naive Bayes (NB), and Random Forest (RF) algorithms. The dataset used comes from Kaggle and consists of 918 data sets with 12 attributes representing patient information related to heart disease prediction. The research stages include data preprocessing, baseline model testing, and re-evaluation using the two oversampling methods. Experimental results show that oversampling can improve the performance of all algorithms. KNN achieved the best results with SMOTE, with an accuracy of 72.98% and an F1-score of 75.39%. In the Naive Bayes algorithm, both oversampling techniques produced relatively stable performance, with the highest F1-score of 73.56% using SMOTE. Meanwhile, Random Forest showed the most optimal performance when combined with Random Oversampling, with an accuracy of 79.19% and an F1-score of 81.51%. These findings confirm that the success of data balancing techniques is strongly influenced by the characteristics of the classification algorithm used, and provide a practical contribution in determining strategies for handling imbalanced data in health research.
References
Alham, S. R. J. I. (2021). Sistem Diagnosis Penyakit Jantung Koroner Dengan Menggunakan Algoritma C4.5 Berbasis Website (Studi Kasus: RSUD Dr. Soedarso Pontianak). Petir, 14(2), 214–222. https://doi.org/10.33322/petir.v14i2.1338
Alwan, J. K., Jaafar, D. S., & Ali, I. R. (2022). Diabetes diagnosis system using modified Naive Bayes classifier. Indonesian Journal of Electrical Engineering and Computer Science, 28(3), 1766–1774. https://doi.org/https://doi.org/10.11591/ijeecs.v28.i3.pp1766-1774
Amen, K., Zohdy, M., & Mahmoud, M. (2020). Machine Learning for Multiple Stage Heart Disease Prediction. 205–223. https://doi.org/10.5121/csit.2020.101118
Anderson, C. J., Cadeddu, R., Anderson, D. N., Huxford, J. A., VanLuik, E. R., Odeh, K., Pittenger, C., Pulst, S. M., & Bortolato, M. (2024). A novel naïve Bayes approach to identifying grooming behaviors in the force-plate actometric platform. Journal of Neuroscience Methods, 403(July 2023), 110026. https://doi.org/10.1016/j.jneumeth.2023.110026
Assegie, T. A., Subhashni, R., Kumar, N. K., Manivannan, J. P., Duraisamy, P., & Engidaye, M. F. (2022). Random forest and support vector machine-based hybrid liver disease detection. Bulletin of Electrical Engineering and Informatics, 11(3), 1650–1656. https://doi.org/https://doi.org/10.11591/eei.v11i3.3787
Badar, M., & Fisichella, M. (2024). Fair-CMNB: Advancing Fairness-Aware Stream Learning with Naïve Bayes and Multi-Objective Optimization. Big Data and Cognitive Computing, 8(2). https://doi.org/https://doi.org/10.3390/bdcc8020016
Bahri, S., Marisa Midyanti, D., Hidayati, R., Sistem Komputer, J., & Mipa, F. (2018). Perbandingan Algoritma Naive Bayes dan C4.5 Untuk Klasifikasi Penyakit Anak. Seminar Nasional Aplikasi Teknologi Informasi (SNATi), 11–2018.
Berrar, D. (2018). Bayes’ theorem and naive bayes classifier. Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics, 1–3(2018), 403–412. https://doi.org/10.1016/B978-0-12-809633-8.20473-1
Chivukula, R., Jaya Lakshmi, T., Uday, S. S., & Pavani, S. T. (2021). Classifying clinically actionable genetic mutations using KNN and SVM. Indonesian Journal of Electrical Engineering and Computer Science, 24(3), 1672–1679. https://doi.org/https://doi.org/10.11591/ijeecs.v24.i3.pp1672-1679
Elin Nurlia, U. E. (2021). PENERAPAN FITUR SELEKSI FORWARD SELECTION UNTUK MENENTUKAN KEMATIAN AKIBAT GAGAL JANTUNG MENGGUNAKAN. 6(1), 42–50.
Halder, R. K., Uddin, M. N., Uddin, M. A., Aryal, S., & Khraisat, A. (2024). Enhancing K-nearest neighbor algorithm: a comprehensive review and performance analysis of modifications. Journal of Big Data, 11(1). https://doi.org/10.1186/s40537-024-00973-y
Hasib, K. M., Iqbal, M. S., Shah, F. M., Mahmud, J. Al, Popel, M. H., Showrov, M. I. H., Ahmed, S., & Rahman, O. (2020). A Survey of Methods for Managing the Classification and Solution of Data Imbalance Problem. Journal of Computer Science, 16(11), 1546–1557. https://doi.org/10.3844/JCSSP.2020.1546.1557
Ige, T., Kiekintveld, C., Piplai, A., Waggler, A., Kolade, O., & Matti, B. H. (2024). An investigation into the performances of the Current state-of-the-art Naive Bayes, Non-Bayesian and Deep Learning Based Classifier for Phishing Detection: A Survey. http://arxiv.org/abs/2411.16751
Islam, M. S., Hasan, M. M., Rahim, M. A., Hasan, A. M., Mynuddin, M., Khandokar, I., & Islam, M. J. (2022). Machine Learning-Based Music Genre Classification with Pre-Processed Feature Analysis. Jurnal Ilmiah Teknik Elektro Komputer Dan Informatika, 7(3), 491. https://doi.org/https://doi.org/10.26555/jiteki.v7i3.22327
Khaleel, A. A., Al-Azzawi, A. A. M., & Alkhazraji, A. M. (2023). Random forest for lung cancer analysis using Apache Mahout and Hadoop based on software defined networking. Indonesian Journal of Electrical Engineering and Computer Science, 32(2), 1086–1093. https://doi.org/https://doi.org/10.11591/ijeecs.v32.i2.pp1086-1093
Liang, X. W., Jiang, A. P., Li, T., Xue, Y. Y., & Wang, G. T. (2020). LR-SMOTE — An improved unbalanced data set oversampling based on K-means and SVM. Knowledge-Based Systems, 196. https://doi.org/10.1016/j.knosys.2020.105845
Malek, N. H. A., Yaacob, W. F. W., Wah, Y. B., Md Nasir, S. A., Shaadan, N., & Indratno, S. W. (2023). Comparison of ensemble hybrid sampling with bagging and boosting machine learning approach for imbalanced data. Indonesian Journal of Electrical Engineering and Computer Science, 29(1), 598–608. https://doi.org/10.11591/ijeecs.v29.i1.pp598-608
Merdekawati, A. (2022). Komparasi Algoritma Data Mining dan Perancangan Aplikasi Prediksi Harapan Hidup Pasien Gagal Jantung. 14(3), 188–202.
Mohammadagha, M. (n.d.). Hyperparameter Optimization Strategies for Tree-Based Machine Learning Models Prediction : A Comparative Study of AdaBoost , Decision Trees , and Random Forest.
Nadeem, M., Arshad, A., Riaz, S., Zahra, S. W., Rashid, M., Band, S. S., & Mosavi, A. (2023). Preventing Cloud Network from Spamming Attacks Using Cloudflare and KNN. Computers, Materials and Continua, 74(2), 2641–2659. https://doi.org/https://doi.org/10.32604/cmc.2023.028796
Ngo, H. L., Nguyen, H. D., Loubiere, P., Tran, T. Van, Șerban, G., Zelenakova, M., Brețcan, P., & Laffly, D. (2022). The composition of time-series images and using the technique SMOTE ENN for balancing datasets in land use/cover mapping. Acta Montanistica Slovaca, 27(2), 342–359. https://doi.org/10.46544/AMS.v27i2.05
Nguyen, L. V., Vo, Q. T., & Nguyen, T. H. (2023). Adaptive KNN-Based Extended Collaborative Filtering Recommendation Services. Big Data and Cognitive Computing, 7(2). https://doi.org/https://doi.org/10.3390/bdcc7020106
Nur Riza Pahlevi, M., & Badriyah, T. (2025). Implementasi dan Optimasi Hyperparameter pada Model Machine learning untuk Prediksi Diabetes dengan Integrasi Aplikasi Telemedicine. JEPIN (Jurnal Edukasi Dan Penelitian Informatika), 11(2), 287–296.
Pan, T., Zhao, J., Wu, W., & Yang, J. (2020). Learning imbalanced datasets based on SMOTE and Gaussian distribution. Information Sciences, 512, 1214–1233. https://doi.org/10.1016/j.ins.2019.10.048
Pratama, Y., Prayitno, A., Nazrian, D., Aini, N., R, Y. R., & Rasywir, E. (2022). BULLETIN OF COMPUTER SCIENCE RESEARCH Klasifikasi Penyakit Gagal Jantung Menggunakan Algoritma K-Nearest Neighbor. 3(1), 52–56. https://doi.org/10.47065/bulletincsr.v3i1.203
Rapacz, S., Chołda, P., & Natkaniec, M. (2021). A method for fast selection of machine-learning classifiers for spam filtering. Electronics (Switzerland), 10(17). https://doi.org/https://doi.org/10.3390/electronics10172083
Reza, D. A. M., Siregar, A. M., & Rahmat. (2022). Penerapan Algoritma K-Nearest Neighbord Untuk Prediksi Kematian Akibat Penyakit Gagal Jantung. Scientific Student Journal for Information, Technology and Science , III(1), 105–112.
Sahar, S. (2020). Analisis Perbandingan Metode K-Nearest Neighbor dan Naïve Bayes Clasiffier Pada Dataset Penyakit Jantung. Indonesian Journal of Data and Science, 1(3), 79–86. https://doi.org/10.33096/ijodas.v1i3.20
Samosir, A., Hasibuan, M. S., Justino, W. E., & Hariyono, T. (2021). Komparasi Algoritma Random Forest, Naïve Bayes dan K- Nearest Neighbor Dalam klasifikasi Data Penyakit Jantung. Prosiding Seminar Nasional Darmajaya, 1(0), 214–222. https://jurnal.darmajaya.ac.id/index.php/PSND/article/view/2955
Sampath, P., Elangovan, G., Ravichandran, K., Shanmuganathan, V., Pasupathi, S., Chakrabarti, T., Chakrabarti, P., & Margala, M. (2024). Robust diabetic prediction using ensemble machine learning models with synthetic minority over-sampling technique. Scientific Reports, 14(1), 1–15. https://doi.org/10.1038/s41598-024-78519-8
Sekulić, A., Kilibarda, M., Heuvelink, G. B. M., Nikolić, M., & Bajat, B. (2020). Random forest spatial interpolation. Remote Sensing, 12(10), 1–29. https://doi.org/https://doi.org/10.3390/rs12101687
Sepharni, A., Hendrawan, I. E., & Rozikin, C. (2022). Klasifikasi Penyakit Jantung dengan Menggunakan Algoritma C4.5. STRING (Satuan Tulisan Riset Dan Inovasi Teknologi), 7(2), 117. https://doi.org/10.30998/string.v7i2.12012
Shakeela, S., Shankar, N. S., Reddy, P. M., Tulasi, T. K., & Koneru, M. M. (2021). Optimal ensemble learning based on distinctive feature selection by univariate ANOVA-F statistics for IDS. International Journal of Electronics and Telecommunications, 67(2), 267–275. https://doi.org/10.24425/ijet.2021.135975
Soltanzadeh, P., & Hashemzadeh, M. (2021). RCSMOTE: Range-Controlled synthetic minority over-sampling technique for handling the class imbalance problem. Information Sciences, 542, 92–111. https://doi.org/10.1016/j.ins.2020.07.014
Sugiyarto, A. W., Abadi, A. M., & Sumarna. (2021). Classification of heart disease based on PCG signal using CNN. Telkomnika (Telecommunication Computing Electronics and Control), 19(5), 1697–1706. https://doi.org/10.12928/TELKOMNIKA.v19i5.20486
Syahputra, H., & Wibowo, A. (2023). Comparison of Support Vector Machine (SVM) and Random Forest Algorithm for Detection of Negative Content on Websites. Jurnal Ilmiah Teknik Elektro Komputer Dan Informatika (JITEKI), 9(1), 165–173. https://doi.org/https://doi.org/10.26555/jiteki.v9i1.25861
Wang, S., Ren, J., & Bai, R. (2023). A semi-supervised adaptive discriminative discretization method improving discrimination power of regularized naive Bayes. Expert Systems with Applications, 225(April), 120094. https://doi.org/10.1016/j.eswa.2023.120094
Wang, X., Zhai, M., Ren, Z., Ren, H., Li, M., Quan, D., Chen, L., & Qiu, L. (2021). Exploratory study on classification of diabetes mellitus through a combined Random Forest Classifier. BMC Medical Informatics and Decision Making, 21(1), 1–14. https://doi.org/10.1186/s12911-021-01471-4
Xin, L. K., & Rashid, N. binti A. (2021). Prediction of depression among women using random oversampling and random forest. 2021 International Conference of Women in Data Science at Taif University, WiDSTaif 2021. https://doi.org/10.1109/WIDSTAIF52235.2021.9430215
Yang, Y., & Liu, X. (n.d.). A re-examination of text categorization methods.
Zhang, J., Li, Y., Shen, F., He, Y., Tan, H., & He, Y. (2024). Hierarchical text classification with multi-label contrastive learning and KNN. Neurocomputing, 577(January), 127323. https://doi.org/https://doi.org/10.1016/j.neucom.2024.127323
Zhu, Y., Kong, B., Liu, R., & Zhao, Y. (2022). Developing biomedical engineering technologies for reproductive medicine. Smart Medicine, 1(1). https://doi.org/10.1002/smmd.20220006
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Prosiding Seminar Nasional Ilmu Teknik

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.





