Comparison of Semi-Supervised Learning Performance in Indonesian Sentiment Analysis: An Empirical Study between Statistical Machine Learning and Deep Learning Approaches
DOI:
https://doi.org/10.31098/cset.v4i1.957Keywords:
semi-supervised learning, sentiment analysis, statistical machine learning, Bi-LSTM, pseudo-labelingAbstract
The limited availability of labeled data is a significant challenge in developing sentiment analysis models, especially for Indonesian, which still has minimal annotated resources. Semi-supervised learning (SSL) offers a solution by utilizing large amounts of unlabeled data. This study aims to compare the performance of two main paradigms in SSL—Statistical Machine Learning (SML) and Deep Learning (DL)—in the context of Indonesian text sentiment classification. Four SML models (KNN, Naïve Bayes, Random Forest, SVM) with TF-IDF, Word2Vec, and FastText feature representations were compared with a FastText embedding-based Bi-LSTM architecture that was fine-tuned. Experiments were conducted on two datasets: product reviews (14,000 instances) and social media (22,000 instances), each with only 10% of the initial labeled data. The self-training approach was applied with a confidence threshold of 0.8 and a maximum of 3 iterations. The results show that DL consistently outperforms in accuracy (achieving 89.7% vs. 84.2% on large datasets), F1-score (89.4% vs. 83.6%), and efficiency in utilizing unlabeled data (95.6% accepted pseudo-labels vs. 90.2%). However, this advantage comes at the cost of 4x higher computational costs and lower interpretability. SML remains relevant for scenarios with limited resources or when model transparency is a priority. This study recommends using DL if the infrastructure is adequate, and SML if interpretability and computational efficiency are prioritized. These findings provide empirical guidance for practitioners and academics in choosing the optimal SSL approach for Indonesian language sentiment analysis.Downloads
Published
2025-10-15
How to Cite
Husaini, R., Cahyana, N. H., Wiendijarti, I., & Aribowo, A. S. (2025). Comparison of Semi-Supervised Learning Performance in Indonesian Sentiment Analysis: An Empirical Study between Statistical Machine Learning and Deep Learning Approaches. RSF Conference Series: Engineering and Technology, 4(1), 64–72. https://doi.org/10.31098/cset.v4i1.957
Issue
Section
Articles