Analisis Sentimen Ulasan Aplikasi CapCut Berbasis IndoBERT dengan Validasi Silang

Tasya Nurdin; Dodo Zaenal Abidin; Kurniabudi Kurniabudi

doi:10.61132/prosemnasproit.v2i2.157

Authors

Tasya Nurdin Universitas Dinamika Bangsa
Dodo Zaenal Abidin Universitas Dinamika Bangsa
Kurniabudi Kurniabudi Universitas Dinamika Bangsa

DOI:

https://doi.org/10.61132/prosemnasproit.v2i2.157

Keywords:

Capcut, Cross Validation, IndoBERT, Sentiment Analysis, Transformer

Abstract

This study conducts sentiment analysis of Indonesian user reviews of the CapCut application using IndoBERT and compares two evaluation schemes: a single 80/20 train–test split and stratified 5-fold cross-validation (k=5). A total of 1,048,575 reviews were collected from the Google Play Store through web scraping and labeled into three sentiment classes based on rating: negative (1–2), neutral (3), and positive (4–5). After preprocessing—cleaning, case folding, banned-word removal, normalization—and duplicate removal, 517,962 reviews were retained. IndoBERT Base P1 was fine-tuned using fixed hyperparameters (batch size 32, learning rate 2e-5, up to 4 epochs, early stopping patience 2), while undersampling was applied to the training set to address class imbalance. Performance was assessed using accuracy, precision, recall, F1-score, and ROC-AUC, supported by confusion matrix and ROC-curve visualizations. The single split achieved an accuracy of 0.756, whereas cross-validation produced a mean accuracy of 0.740. Across both schemes, the positive class achieved the best performance (F1-score 0.850; ROC-AUC 0.918–0.919), while the neutral class remained the most challenging (precision 0.198–0.206; F1-score 0.280–0.283). Overall, cross-validation is recommended for reporting because it reduces dependence on a single partition and provides a more representative estimate across multiple splits.

References

Abidin, D. Z., Afuan, L., & Toscany, A. N. (2025). A Comprehensive Benchmarking Pipeline for Transformer-Based Sentiment Analysis using Cross-Validated Metrics. 6(4).

Apriyadi, C., & Styawati, S. (2025). Sentiment Analysis of Cyber Attacks in Bank Syariah Indonesia Using SVM and Indobert Method. Jurnal Teknik Informatika (Jutif), 6(2), 819–838. https://doi.org/10.52436/1.jutif.2025.6.2.2636

Asri, Y., Kuswardani, D., Ramadhana, S. A., TS, J. F. P., Marbun, D. U. N., Fatimah, F. N., & Qoriza, Z. (2025). OPTIMALISASI ANALISIS SENTIMEN DENGAN SPELLING CORRECTOR. Uwais Inspirasi Indonesia. https://books.google.co.id/books?id=WItmEQAAQBAJ

Baihaqi, W. M., & Munandar, A. (2023). Sentiment Analysis of Student Comment on the College Performance Evaluation Questionnaire Using Naïve Bayes and IndoBERT. JUITA : Jurnal Informatika, 11(2), 213. https://doi.org/10.30595/juita.v11i2.17336

CapCut Video Editor on Google Play. (n.d.). https://play.google.com/store/apps/details?id=com.lemon.lvoverseas&hl=id

Dąbrowski, J., Letier, E., Perini, A., & Susi, A. (2022). Analysing app reviews for software engineering: A systematic literature review. Empirical Software Engineering, 27(2). https://doi.org/10.1007/s10664-021-10065-7

Ferdous, S. M., Newaz, S. N. E., Mugdha, S. B. S., & Uddin, M. (2024). Sentiment Analysis in the Transformative Era of Machine Learning: A Comprehensive Review. Statistics, Optimization & Information Computing, 13(1), 331–346. https://doi.org/10.19139/soic-2310-5070-2113

Ferrari, A., & Ginde, G. (2025). Handbook on Natural Language Processing for Requirements Engineering. Springer Nature Switzerland. https://books.google.co.id/books?id=vaVMEQAAQBAJ

Hua, H., Li, X., Dou, D., Xu, C., & Luo, J. (2021). Noise Stability Regularization for Improving BERT Fine-tuning. Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 3229–3241. https://doi.org/10.18653/v1/2021.naacl-main.258

Kumar, L. A., & Renuka, D. K. (2023). Deep Learning Approach for Natural Language Processing, Speech, and Computer Vision: Techniques and Use Cases. CRC Press. https://books.google.co.id/books?id=HmqwEAAAQBAJ

Liu, B. (2020). Sentiment Analysis: Mining Opinions, Sentiments, and Emotions. Cambridge University Press. https://books.google.co.id/books?id=PdX7DwAAQBAJ

Mola, S. A. S., Djawa, S. N. R., & Mauko, A. Y. (2025). Text Mining: Analisis Sentimen dengan Naïve Bayes. Kaizen Media Publishing. https://books.google.co.id/books?id=qrxNEQAAQBAJ

Srivastava, G., & Lin, C. W. (2025). Guest Editorial Special Section on Fuzzy-Deep Neural Network Learning in Sentiment Analysis. In IEEE Transactions on Fuzzy Systems (Vol. 33, Issue 1, pp. 1–2). Institute of Electrical and Electronics Engineers Inc. https://doi.org/10.1109/TFUZZ.2024.3520662

Sudhir, P., & Suresh, V. D. (2021). Comparative study of various approaches, applications and classifiers for sentiment analysis. Global Transitions Proceedings, 2(2), 205–211. https://doi.org/10.1016/j.gltp.2021.08.004

Suhadi, E., Kunci, K., Ekspor, B., & Analisis, S. (2025). ANALISIS SENTIMEN APLIKASI BISA EKSPOR PADA ULASAN PENGGUNA DI GOOGLE PLAY DENGAN NAÏVE BAYES. In JIKA (No. 1; Vol. 9, pp. 2722–2713).

Tang, T., Tang, X., & Yuan, T. (2020). Fine-Tuning BERT for Multi-Label Sentiment Analysis in Unbalanced Code-Switching Text. IEEE Access, 8, 193248–193256. https://doi.org/10.1109/ACCESS.2020.3030468

Tao, J., & Fang, X. (2020). Toward multi-label sentiment analysis: A transfer learning based approach. Journal of Big Data, 7(1). https://doi.org/10.1186/s40537-019-0278-0

Analisis Sentimen Ulasan Aplikasi CapCut Berbasis IndoBERT dengan Validasi Silang

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Menu new new