Comparison of Feature Extraction in Support Vector Machine (SVM) Based Sentiment Analysis System

Imam Fahrur Rozi; Irma Maulidia; Mamluatul Hani’ah; Rakhmat Arianto; Dika Rizky Yunianto; Ahmadi Yuli Ananta

doi:10.21107/kursor.v13i1.417

Authors

Imam Fahrur Rozi Politeknik Negeri Malang, Indonesia
Irma Maulidia Politeknik Negeri Malang, Indonesia
Mamluatul Hani’ah Politeknik Negeri Malang, Indonesia
Rakhmat Arianto Politeknik Negeri Malang, Indonesia
Dika Rizky Yunianto Politeknik Negeri Malang, Indonesia
Ahmadi Yuli Ananta Politeknik Negeri Malang, Indonesia

DOI:

https://doi.org/10.21107/kursor.v13i1.417

Keywords:

ANOVA, Bag of Word, Feature Extraction, Sentiment Analysis, SVM, TF-IDF, Word2Vec

Abstract

Sentiment analysis plays a crucial role in natural language processing by identifying and categorizing opinions or emotions conveyed in textual data. It is widely applied across diverse fields such as product review analysis, social media monitoring, and market research. To enhance the accuracy and reliability of sentiment classification, various methods and feature extraction techniques have been explored. This study investigates the use of Support Vector Machine (SVM) for sentiment analysis, comparing three feature extraction techniques: Term Frequency-Inverse Document Frequency (TF-IDF), Bag of Words (BoW), and Word2Vec. Our findings indicate that SVM performs effectively with all three feature extraction methods, with TF-IDF yielding the highest accuracy at 0.79. Although the BoW method showed competitive results, it slightly trailed TF-IDF in k-fold validation. Word2Vec, however, exhibited the lowest performance, achieving a maximum accuracy of 0.69. A comparative analysis of accuracy, precision, recall, and F1-score highlight the superiority of TF-IDF in delivering consistent and accurate results. Further statistical analysis using ANOVA revealed no significant differences between the models across any of the evaluation metrics. Additionally, the evaluation was conducted under several scenarios, including tests on balanced and imbalanced datasets, varying dataset sizes, and different CCC parameter values for SVM. These scenarios provided deeper insights into the factors influencing the system's performance, reinforcing that TF-IDF combined with SVM remains the most effective approach in this study.

Downloads

Download data is not yet available.

References

[1] P. Nandwani and R. Verma, “A review on sentiment analysis and emotion detection from text,” Soc Netw Anal Min, vol. 11, no. 1, p. 81, Dec. 2021, doi: 10.1007/s13278-021-00776-6.

[2] K. Khadijah, N. Sabilly, and F. A. Nugroho, “Sentiment Analysis of League of Legends: Wild Rift Reviews on Google Play Using Naive Bayes Classifier,” Jurnal Ilmiah Kursor, vol. 12, no. 1, pp. 23–30, Jul. 2023, doi: 10.21107/kursor.v12i01.328.

[3] R. T. Aldisa and P. Maulana, “Analisis Sentimen Opini Masyarakat Terhadap Vaksinasi Booster COVID-19 Dengan Perbandingan Metode Naive Bayes, Decision Tree dan SVM,” Building of Informatics, Technology and Science (BITS), vol. 4, no. 1, pp. 106–109, Jun. 2022, doi: 10.47065/bits.v4i1.1581.

[4] M. Zoqi Sarwani, “Campus Sentiment Analyss E-Complaint Using Probabilistic Neural Network Algorithm,” Jurnal Ilmiah Kursor, vol. 8, no. 3, p. 135, Mar. 2017, doi: 10.28961/kursor.v8i3.88.

[5] F. Handayani and M. Mustikasari, “Sentiment Analysis of Electric Cars Using Recurrent Neural Network Method in Indonesian Tweets,” Jurnal Ilmiah Kursor, vol. 10, no. 4, Dec. 2020, doi: 10.21107/kursor.v10i4.233.

[6] M. S. Asramanggala, S. S. Prasetyowati, and Y. Sibaroni, “Optimal Number Data Trains in Hoax News Detection of Indonesian using SVM and Word2Vec,” Building of Informatics, Technology and Science (BITS), vol. 5, no. 1, Jun. 2023, doi: 10.47065/bits.v5i1.3516.

[7] D. Darwis, E. S. Pratiwi, and A. F. O. Pasaribu, “Penerapan Algoritma SVM untuk Analisis Sentimen pada DataTwitter Komisi Pemberantasan Korupsi Republik Indonesia,” Edutic - Scientific Journal of Informatics Education, vol. 7, no. 1, Nov. 2020, doi: 10.21107/edutic.v7i1.8779.

[8] F. Del et al., “Hate me, hate me not: Hate speech detection on Facebook,” in Italian Conference on Cybersecurity, 2017. [Online]. Available: http://www.alexa.com/topsites

[9] G. Kovács, P. Alonso, and R. Saini, “Challenges of Hate Speech Detection in Social Media,” SN Comput Sci, vol. 2, no. 2, p. 95, Apr. 2021, doi: 10.1007/s42979-021-00457-3.

[10] S. Saifullah, Y. Fauziyah, and A. S. Aribowo, “Comparison of machine learning for sentiment analysis in detecting anxiety based on social media data,” Jurnal Informatika, vol. 15, no. 1, p. 45, Feb. 2021, doi: 10.26555/jifo.v15i1.a20111.

[11] T. Ahmed Khan, R. Sadiq, Z. Shahid, M. M. Alam, and M. Mohd Su’ud, “Sentiment Analysis using Support Vector Machine and Random Forest,” Journal of Informatics and Web Engineering, vol. 3, no. 1, pp. 67–75, Feb. 2024, doi: 10.33093/jiwe.2024.3.1.5.

[12] A. Rahman Isnain, A. Indra Sakti, D. Alita, and N. Satya Marga, “Sentimen Analisis Publik terhadap Kebijakan Lockdown Pemerintah Jakarta Menggunakan Algoritma SVM,” JDMSI, vol. 2, no. 1, pp. 31–37, 2021, [Online]. Available: https://t.co/NfhnfMjtXw

[13] N. A. Semary, W. Ahmed, K. Amin, P. Pławiak, and M. Hammad, “Enhancing machine learning-based sentiment analysis through feature extraction techniques,” PLoS One, vol. 19, no. 2, p. e0294968, Feb. 2024, doi: 10.1371/journal.pone.0294968.

[14] A. Yodi Prayoga, A. Id Hadiana, and F. Rakhmat Umbara, “Deteksi Hoax pada Berita Online Bahasa Inggris Menggunakan Bernoulli Naïve Bayes dengan Ekstraksi Fitur Tf-Idf,” Jurnal Syntax Admiration, vol. 2, no. 10, pp. 1808–1823, Oct. 2021, doi: 10.46799/jsa.v2i10.327.

[15] A. A. Firdaus, A. Id Hadiana, and A. K. Ningsih, “Klasifikasi Sentimen pada Aplikasi Shopee Menggunakan Fitur Bag of Word dan Algoritma Random Forest,” R2J, vol. 6, no. 5, 2024, doi: 10.38035/rrj.v6i5.

[16] A. Nurdin, B. Anggo, S. Aji, A. Bustamin, and Z. Abidin, “Perbandingan Kinerja Word Embedding Word2Vec, Glove, dan FastText pada Klasifikasi Teks,” Jurnal TEKNOKOMPAK, vol. 14, no. 2, p. 74, 2020.

[17] L. Efrizoni, S. Defit, M. Tajuddin, and A. Anggrawan, “Komparasi Ekstraksi Fitur dalam Klasifikasi Teks Multilabel Menggunakan Algoritma Machine Learning,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 21, no. 3, pp. 653–666, Jul. 2022, doi: 10.30812/matrik.v21i3.1851.

[18] A. N. Syafia, M. F. Hidayattullah, and W. Suteddy, “Studi Komparasi Algoritma SVM Dan Random Forest Pada Analisis Sentimen Komentar Youtube BTS,” vol. 8, no. 3, 2023.

[19] D. Darwis, E. S. Pratiwi, A. Ferico, and O. Pasaribu, “Penerapan Algoritma SVM untuk Analisis Sentimen pada DataTwitter Komisi Pemberantasan Korupsi Republik Indonesia,” 2020.

[20] T. M. F. A. N. Jeremy Andre Septian, “Analisis Sentimen Pengguna Twitter Terhadap Polemik Persepakbolaan Indonesia Menggunakan Pembobotan TF-IDF dan K-Nearest Neighbor,” 2019. [Online]. Available: https://t.co/9WloaWpfD5

[21] D. Sugiarto, E. Utami, and A. Yaqin, “Perbandingan Kinerja Model TF-IDF dan BOW untuk Klasifikasi Opini Publik Tentang Kebijakan BLT Minyak Goreng”.

[22] M. R. Faisal, “Ekstraksi Fitur Menggunakan Model Word2vec Untuk Analisis Sentimen Pada Komentar Facebook.” [Online]. Available: https://www.researchgate.net/publication/343057288

[23] S. Sivakumar, L. S. Videla, T. Rajesh Kumar, J. Nagaraj, S. Itnal, and D. Haritha, “Review on Word2Vec Word Embedding Neural Net,” in 2020 International Conference on Smart Electronics and Communication (ICOSEC), IEEE, Sep. 2020, pp. 282–290. doi: 10.1109/ICOSEC49089.2020.9215319.

[24] S. Khairunnisa, A. Adiwijaya, and S. Al Faraby, “Pengaruh Text Preprocessing terhadap Analisis Sentimen Komentar Masyarakat pada Media Sosial Twitter (Studi Kasus Pandemi COVID-19),” JURNAL MEDIA INFORMATIKA BUDIDARMA, vol. 5, no. 2, p. 406, Apr. 2021, doi: 10.30865/mib.v5i2.2835.

[25] S. Hardiristanto et al., “Determining the Abnormality of Bull Sperm Tail Morphology Using Support Vector,” 2013.

[26] A. Asri and W. Setiawan, “Multiple Discriminant Analysis with Fukunaga Koontz Transfor and Support Vector Machine for Image-Based Face Detection and Recognition,” 2013.

[27] G. Rininda, I. H. Santi, and S. Kirom, “Penerapan SVM dalam Analisis Sentimen pada EdLink Menggunakan Pengujian Confusion Matrix,” 2023.

[28] N. Nurainun, E. Haerani, F. Syafria, and L. Oktavia, “Penerapan Algoritma Naïve Bayes Classifier Dalam Klasifikasi Status Gizi Balita dengan Pengujian K-Fold Cross Validation,” Journal of Computer System and Informatics (JoSYC), vol. 4, no. 3, pp. 578–586, May 2023, doi: 10.47065/josyc.v4i3.3414.

[29] J. Lilleberg, Y. Zhu, and Y. Zhang, “Support vector machines and Word2vec for text classification with semantic features,” in 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC), IEEE, Jul. 2015, pp. 136–140. doi: 10.1109/ICCI-CC.2015.7259377.

[30] L. Zhu, G. Wang, and X. Zou, “A Study of Chinese Document Representation and Classification with Word2vec,” in 2016 9th International Symposium on Computational Intelligence and Design (ISCID), IEEE, Dec. 2016, pp. 298–302. doi: 10.1109/ISCID.2016.1075.

[31] D. E. Cahyani and I. Patasik, “Performance comparison of TF-IDF and Word2Vec models for emotion text classification,” Bulletin of Electrical Engineering and Informatics, vol. 10, no. 5, pp. 2780–2788, Oct. 2021, Accessed: Dec. 04, 2024. [Online]. Available: https://www.beei.org/index.php/EEI/article/view/3157

[32] R. Dzisevic and D. Sesok, “Text Classification using Different Feature Extraction Approaches,” in 2019 Open Conference of Electrical, Electronic and Information Sciences (eStream), IEEE, Apr. 2019, pp. 1–4. doi: 10.1109/eStream.2019.8732167.

Comparison of Feature Extraction in Support Vector Machine (SVM) Based Sentiment Analysis System

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

Citation Check

License

Make a Submission

system

TOOLS

tanggal_penting

Important Date

template2

certificate

histats

purcase_contact

Purchase Contact

Information