Design Text Mining for Anxiety Detection using Machine Learning based-on Social Media Data during COVID-19 pandemic

Authors

  • Yuli Fauziah Universitas Pembangunan Nasional Veteran Yogyakarta
  • Shoffan Saifullah Universitas Pembangunan Nasional Veteran Yogyakarta
  • Agus Sasmito Aribowo Universiti Teknikal Malaysia Melaka, Malaysia

DOI:

https://doi.org/10.31098/ess.v1i1.117

Keywords:

Keywords, anxiety detection, COVID-19, machine learning, random forest, xgboost

Abstract

The COVID-19 pandemic has a profound impact on all groups, including governments, agencies, and individuals. It can make anxiety have a bad effect. So it is necessary to detect the existence of anxiety from the government to suppress and improve the community's psychology. This research aims to design text mining to detect anxiety during a pandemic by applying machine learning technology. Two methods of machine learning are designed, namely, random forest and xgboost. This design uses a sample of data from YouTube comments with a total of 4862 consisting of 3211 for negative data and 1651 for positive data. Negative data identify anxiety, while positive data identifies hope (not worry). The design of the application of this method was carried out by preliminary testing with three calculations, namely accuracy, precision, and recall. The accuracy of the Random Forest and XGBOOST methods is 83% and 73%. Meanwhile, precision and recall have an inversely proportional value. Random Forest has a precision value greater than 45% compared to xgboost. Whereas Recall, XGBOOST is bigger than ten compared to Random Forest. Random Forest can reference machine learning methods to detect someone's anxiety based on data from social media. 

References

Ahmad, A. R., and Murad, H. R. (2020) 'The Impact of Social Media on Panic During the COVID-19 Pandemic in Iraqi Kurdistan: Online Questionnaire Study', Journal of Medical Internet Research, 22(5), p. e19556. DOI: 10.2196/19556.

Aladağ, A. E. et al. (2018) 'Detecting Suicidal Ideation on Forums: Proof-of-Concept Study,' Journal of Medical Internet Research, 20(6), p. e215. DOI: 10.2196/jmir.9840.

Almonayyes, A. (2006) ‘Multiple Explanations Driven Naive Bayes Classifier.’, Journal of Universal Computer Science, 12(2), pp. 127–139.

Almonayyes, A. (2016) 'Classifying Documents By Integrating Contextual Knowledge With Boosting,' in International Conference on Artificial Intelligence and Computer Science, pp. 28–29.

Almonayyes, A. (2017) ‘Tweets Classification Using Contextual Knowledge And Boosting’, International Journal of Advances in Electronics and Computer Science, (4), pp. 87–92.

Bhati, R. (2020) 'Sentiment analysis a deep survey on methods and approaches.' Science & Engineering Research Support society.

Calderón-Monge, E. (2017) 'Twitter to Manage Emotions in Political Marketing,' Journal of Promotion Management. Taylor & Francis, 23(3), pp. 359–371. DOI: 10.1080/10496491.2017.1294870.

Chen, Y. L., Chang, C. L. and Yeh, C. S. (2017) 'Emotion Classification of YouTube Videos,' Decision Support Systems, 101, pp. 40–50. DOI: 10.1016/j.dss.2017.05.014.

Chin, D., Zappone, A. and Zhao, J. (2016) ‘Analyzing Twitter Sentiment of the 2016 Presidential Candidates’, Applied Informatics and Technology Innovation Conference (AITIC 2016).

Djuric, N. et al. (2015) 'Hate Speech Detection with Comment Embeddings,' pp. 29–30.

Georganos, S. et al. (2018) 'Very High-Resolution Object-Based Land Use–Land Cover Urban Classification Using Extreme Gradient Boosting,' IEEE Geoscience and Remote Sensing Letters, 15(4), pp. 607–611. DOI: 10.1109/LGRS.2018.2803259.

Giannakis, M. et al. (2020) 'Social media and sensemaking patterns in new product development: demystifying the customer sentiment,' Annals of Operations Research. DOI: 10.1007/s10479-020-03775-6.

Gitari, N. D. et al. (2015) 'A Lexicon-based Approach for Hate Speech Detection,' International Journal of Multimedia and Ubiquitous Engineering, 10(4), pp. 215–230. DOI: 10.14257/ijmue.2015.10.4.21.

Gokulakrishnan, B. et al. (2012) 'Opinion mining and sentiment analysis on a Twitter data stream,' in International Conference on Advances in ICT for Emerging Regions (ICTer2012). IEEE, pp. 182–188. DOI: 10.1109/ICTer.2012.6423033.

Hamzah, F. A. B. et al. (2020) ‘CoronaTracker: worldwide COVID-19 outbreak data analysis and prediction’, Bull World Health Organ, 1, p. 32.

Jayalekshmi, J. and Mathew, T. (2017) 'Facial expression recognition and emotion classification system for sentiment analysis,' in 2017 International Conference on Networks & Advances in Computational Technologies (NetACT). IEEE, pp. 1–8. DOI: 10.1109/NETACT.2017.8076732.

Kléma, J. and Almonayyes, A. (2006) 'Automatic Categorization of Fanatic Text Using random Forests,' Kuwait Journal of Science and Engineering, 33(2), pp. 1–18.

Kumar, S., Yadava, M. and Roy, P. P. (2019) 'Fusion of EEG response and sentiment analysis of products review to predict customer satisfaction,' Information Fusion, 52, pp. 41–52. DOI: 10.1016/j.inffus.2018.11.001.

Nanur, F. N., Halu, S. A. N. and Juita, E. (2020) ‘EVALUASI KETERSEDIAAN FASILITAS KESEHATAN YANG MEMADAI TERHADAP PENCAPAIAN REVOLUSI KIA DI MANGGARAI’, Jurnal Kebidanan, 12(01), pp. 80–92.

Ni, M. Y. et al. (2020) ‘Mental Health, Risk Factors, and Social Media Use During the COVID-19 Epidemic and Cordon Sanitaire Among the Community and Health Professionals in Wuhan, China: Cross-Sectional Survey’, JMIR Mental Health, 7(5), p. e19009. DOI: 10.2196/19009.

Osadchiy, V., Mills, J. N., and Eleswarapu, S. V. (2020) 'Understanding Patient Anxieties in the Social Media Era: Qualitative Analysis and Natural Language Processing of an Online Male Infertility Community,' Journal of Medical Internet Research, 22(3), p. e16728. DOI: 10.2196/16728.

Rabbimov, I. et al. (2020) 'Investigating the Effect of Emoji in Opinion Classification of Uzbek Movie Review Comments,' arXiv preprint arXiv:2008.00482.

Ragini, J. R., Anand, P. M. R. and Bhaskar, V. (2018) 'Big data analytics for disaster response and recovery through sentiment analysis,' International Journal of Information Management, 42, pp. 13–24. DOI: 10.1016/j.ijinfomgt.2018.05.004.

Saifullah, S. (2019) 'Fuzzy-AHP approach using Normalized Decision Matrix on Tourism Trend Ranking based-on Social Media,' Jurnal Informatika, 13(2), p. 16. DOI: 10.26555/jifo.v13i2.a15268.

Saputra, T. A. (2020) ‘BENTUK KECEMASAN DAN RESILIENSI MAHASISWA PASCASARJANA ACEH-YOGYAKARTA DALAM MENGHADAPI PANDEMI COVID-19’, JURNAL BIMBINGAN DAN KONSELING AR-RAHMAN, 6(1), pp. 55–61.

Schmidt, A., and Wiegand, M. (2017) 'A Survey on Hate Speech Detection using Natural Language Processing,' in Fifth International Workshop on Natural Language Processing for Social Media, pp. 1–10. DOI: 10.18653/v1/W17-1101.

Setiati, S. and Azwar, M. K. (2020) ‘COVID-19 and Indonesia’, Acta Medica Indonesiana, 52(1), pp. 84–89. Available at: https://www.scopus.com/record/display.uri?eid=2-s2.0-85083414691&origin=inward.

Silalahi, D. E. and Ginting, R. R. (2020) ‘Strategi Kebijakan Fiskal Pemerintah Indonesia Untuk Mengatur Penerimaan dan Pengeluaran Negara Dalam Menghadapi Pandemi Covid-19’, Jesya (Jurnal Ekonomi & Ekonomi Syariah), 3(2), pp. 156–167.

Somawati, A. V. et al. (2020) Bali vs COVID-19: Book Chapters. Nilacakra.

Srujan, K. S. et al. (2018) 'Classification of Amazon Book Reviews Based on Sentiment Analysis,' in, pp. 401–411. DOI: 10.1007/978-981-10-7512-4_40.

Thaha, A. F. (2020) ‘Dampak covid-19 terhadap UMKM di Indonesia’, BRAND Jurnal Ilmiah Manajemen Pemasaran, 2(1), pp. 147–153.

Del Vigna, F. et al. (2017) 'Hate me, hate me not: Hate speech detection on Facebook,' in First Italian Conference on Cybersecurity (ITASEC17), pp. 86–95. DOI: 10.1051/matecconf/201712502035.

Vo, B.-K. H. and Collier, N. (2013) 'Twitter Emotion Analysis in Earthquake Situations,' International Journal of Computational Linguistics and Applications, 4(1), pp. 159–173.

Warner, W. and Hirschberg, J. (2012) 'Detecting Hate Speech on the World Wide Web,' in Workshop on Language in Social Media (LSM 2012), pp. 19–26.

World Health Organisation (WHO) (2020a) Novel Coronavirus(2019-nCoV) Situation Report-22, 11 February 2020. Available at: https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200211-sitrep-22-ncov.pdf?sfvrsn=fb6d49b1_2.

World Health Organisation (WHO) (2020b) Novel Coronavirus (2019-nCoV), Situation Report-1, 21 January 2020. Available at: https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200121-sitrep-1-2019-ncov.pdf?sfvrsn=20a99c10_4.

Yadav, S. et al. (2018) 'Medical sentiment analysis using social media: towards building a patient assisted system,' in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).

Zhang, Y. and Haghani, A. (2015) 'A gradient boosting method to improve travel time prediction,' Transportation Research Part C: Emerging Technologies, 58, pp. 308–324. DOI: 10.1016/j.trc.2015.02.019.

Zhu, N. et al. (2020) ‘A Novel Coronavirus from Patients with Pneumonia in China, 2019’, New England Journal of Medicine, 382(8), pp. 727–733. DOI: 10.1056/NEJMoa2001017.

Downloads

Published

2020-10-27

Issue

Section

Articles