Deteksi Risiko Diabetes Pada Wanita Hamil Menggunakan Algoritma Random Forest

Studi Kasus: Pima Indian Dataset

Authors

  • Yazid Ichwanuddin Universitas Dinamika Bangsa
  • Maria Rosario B Universitas Dinamika Bangsa
  • Erissya Rasywir Universitas Dinamika Bangsa

DOI:

https://doi.org/10.61132/prosemnasproit.v2i2.62

Keywords:

Decision Support System, Feature Engineering, Gestational Diabetes Mellitus, Random Forest, Risk Prediction

Abstract

Gestational Diabetes Mellitus (GDM) is a pregnancy-related metabolic disorder that poses health risks to both mother and fetus if not detected early, requiring accurate prediction methods for early screening and clinical decision-making. This study applies the Random Forest algorithm to detect GDM risk using clinical data from the Pima Indian Dataset. Data preprocessing included handling missing values, standardization, feature engineering, and a 70:30 train–test split. Two models were developed: a baseline and an optimized model using GridSearchCV hyperparameter tuning, validated with 5-fold cross-validation. Performance was assessed using a classification report, confusion matrix, and ROC–AUC. Results show that the optimized model outperforms the baseline, achieving 88% accuracy, an AUC of  93%, and average recall of 81%–85%. Compared to previous studies, this approach demonstrates improved predictive performance. The findings indicate that combining Random Forest with comprehensive preprocessing, feature engineering, and model optimization is effective and feasible for developing a medical decision support system for early GDM risk screening.

References

Alzubaidi, L., Zhang, J., Humaidi, A. J., Al-Dujaili, A., Duan, Y., Al-Shamma, O., Santamaría, J., Fadhel, M. A., Al-Amidie, M., & Farhan, L. (2021). Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. Journal of Big Data, 8(1). https://doi.org/10.1186/s40537-021-00444-8

H, W., N, L., T, C., M, W., H, S., L, Y., & X, Y. (2022). IDF diabetes atlas: Estimation of global and regional gestational diabetes mellitus prevalence for 2021 by International Association of Diabetes in Pregnancy Study Group’s criteria. Diabetes Research and Clinical Practice, 183.

International Diabetes Federation. (2024). IDF Diabetes Atlas. IDF.

Joseph, V. R., & Vakayil, A. (2022). SPlit: An Optimal Method for Data Splitting. Technometrics, 64(2), 166–176. https://doi.org/10.1080/00401706.2021.1921037

Kaya, Y., Bütün, Z., Çelik, Ö., Salik, E. A., Tahta, T., & Yavuz, A. A. (2024). The early prediction of gestational diabetes mellitus by machine learning models. BMC Pregnancy and Childbirth, 24(1). https://doi.org/10.1186/s12884-024-06783-7

Mantri, N., Goel, A. D., Patel, M., Baskaran, P., Dutta, G., Gupta, M. K., Yadav, V., Mittal, M., Shekhar, S., & Bhardwaj, P. (2024). National and regional prevalence of gestational diabetes mellitus in India: a systematic review and Meta-analysis. BMC Public Health, 24(1). https://doi.org/10.1186/s12889-024-18024-9

Mori, R., & Pandey, A. (2022). Global burden of early pregnancy gestational diabetes mellitus (eGDM): prevalence, risk factors and outcomes. Acta Diabetologica, 59(4), 453–462. https://pubmed.ncbi.nlm.nih.gov/34743219/

Nassiwa, F., & Zeng, J. (n.d.). Evaluating Traditional Machine Learning Models for Predicting Diabetes Onset Using the Pima Indians Dataset. https://ssrn.com/abstract=4878052

Pham, H. H., Nguyen, H. Q., Nguyen, H. T., Le, L. T., & Lam, K. (2023). Evaluating the impact of an explainable machine learning system on the interobserver agreement in chest radiograph interpretation. http://arxiv.org/abs/2304.01220

Reddy, A. A., & Kumar, P. (2023). Feature selection and feature engineering strategies for diabetes prediction. Journal of Biomedical Informatics.

UCI Machine learning. (2021). Pima Indians Diabetes Database. Kaggle. https://www.kaggle.com/datasets/uciml/pima-indians-diabetes-database?utm_source=chatgpt.com

Wang, W. (2024). Principles of Machine Learning: The Three Perspectives (Springer Nature).

World Health Organization. (2023). Diabetes fact sheet. WHO. https://www.who.int/news-room/fact-sheets/detail/diabetes

Xu, Y. (2024). Random Forest-based clinical decision support for gestational diabetes prediction and feature interpretation. IEEE Access, 12.

Zhang, Z., Yang, L., Han, W., Wu, Y., Zhang, L., Gao, C., Jiang, K., Liu, Y., & Wu, H. (2022). Machine Learning Prediction Models for Gestational Diabetes Mellitus: Meta-analysis. Journal of Medical Internet Research, 24(3). https://doi.org/10.2196/26634

Downloads

Published

2025-12-29

How to Cite

Ichwanuddin, Y., Maria Rosario B, & Erissya Rasywir. (2025). Deteksi Risiko Diabetes Pada Wanita Hamil Menggunakan Algoritma Random Forest : Studi Kasus: Pima Indian Dataset. Prosiding Seminar Nasional Ilmu Teknik, 2(2), 559–570. https://doi.org/10.61132/prosemnasproit.v2i2.62