Design Text Mining for Anxiety Detection using Machine Learning based-on Social Media Data during COVID-19 pandemic
Keywords:Keywords, anxiety detection, COVID-19, machine learning, random forest, xgboost
AbstractThe COVID-19 pandemic has a profound impact on all groups, including governments, agencies, and individuals. It can make anxiety have a bad effect. So it is necessary to detect the existence of anxiety from the government to suppress and improve the community's psychology. This research aims to design text mining to detect anxiety during a pandemic by applying machine learning technology. Two methods of machine learning are designed, namely, random forest and xgboost. This design uses a sample of data from YouTube comments with a total of 4862 consisting of 3211 for negative data and 1651 for positive data. Negative data identify anxiety, while positive data identifies hope (not worry). The design of the application of this method was carried out by preliminary testing with three calculations, namely accuracy, precision, and recall. The accuracy of the Random Forest and XGBOOST methods is 83% and 73%. Meanwhile, precision and recall have an inversely proportional value. Random Forest has a precision value greater than 45% compared to xgboost. Whereas Recall, XGBOOST is bigger than ten compared to Random Forest. Random Forest can reference machine learning methods to detect someone's anxiety based on data from social media.
Ahmad, A. R., and Murad, H. R. (2020) 'The Impact of Social Media on Panic During the COVID-19 Pandemic in Iraqi Kurdistan: Online Questionnaire Study', Journal of Medical Internet Research, 22(5), p. e19556. DOI: 10.2196/19556.
Aladağ, A. E. et al. (2018) 'Detecting Suicidal Ideation on Forums: Proof-of-Concept Study,' Journal of Medical Internet Research, 20(6), p. e215. DOI: 10.2196/jmir.9840.
Almonayyes, A. (2006) ‘Multiple Explanations Driven Naive Bayes Classifier.’, Journal of Universal Computer Science, 12(2), pp. 127–139.
Almonayyes, A. (2016) 'Classifying Documents By Integrating Contextual Knowledge With Boosting,' in International Conference on Artificial Intelligence and Computer Science, pp. 28–29.
Almonayyes, A. (2017) ‘Tweets Classification Using Contextual Knowledge And Boosting’, International Journal of Advances in Electronics and Computer Science, (4), pp. 87–92.
Bhati, R. (2020) 'Sentiment analysis a deep survey on methods and approaches.' Science & Engineering Research Support society.
Calderón-Monge, E. (2017) 'Twitter to Manage Emotions in Political Marketing,' Journal of Promotion Management. Taylor & Francis, 23(3), pp. 359–371. DOI: 10.1080/10496491.2017.1294870.
Chen, Y. L., Chang, C. L. and Yeh, C. S. (2017) 'Emotion Classification of YouTube Videos,' Decision Support Systems, 101, pp. 40–50. DOI: 10.1016/j.dss.2017.05.014.
Chin, D., Zappone, A. and Zhao, J. (2016) ‘Analyzing Twitter Sentiment of the 2016 Presidential Candidates’, Applied Informatics and Technology Innovation Conference (AITIC 2016).
Djuric, N. et al. (2015) 'Hate Speech Detection with Comment Embeddings,' pp. 29–30.
Georganos, S. et al. (2018) 'Very High-Resolution Object-Based Land Use–Land Cover Urban Classification Using Extreme Gradient Boosting,' IEEE Geoscience and Remote Sensing Letters, 15(4), pp. 607–611. DOI: 10.1109/LGRS.2018.2803259.
Giannakis, M. et al. (2020) 'Social media and sensemaking patterns in new product development: demystifying the customer sentiment,' Annals of Operations Research. DOI: 10.1007/s10479-020-03775-6.
Gitari, N. D. et al. (2015) 'A Lexicon-based Approach for Hate Speech Detection,' International Journal of Multimedia and Ubiquitous Engineering, 10(4), pp. 215–230. DOI: 10.14257/ijmue.2015.10.4.21.
Gokulakrishnan, B. et al. (2012) 'Opinion mining and sentiment analysis on a Twitter data stream,' in International Conference on Advances in ICT for Emerging Regions (ICTer2012). IEEE, pp. 182–188. DOI: 10.1109/ICTer.2012.6423033.
Hamzah, F. A. B. et al. (2020) ‘CoronaTracker: worldwide COVID-19 outbreak data analysis and prediction’, Bull World Health Organ, 1, p. 32.
Jayalekshmi, J. and Mathew, T. (2017) 'Facial expression recognition and emotion classification system for sentiment analysis,' in 2017 International Conference on Networks & Advances in Computational Technologies (NetACT). IEEE, pp. 1–8. DOI: 10.1109/NETACT.2017.8076732.
Kléma, J. and Almonayyes, A. (2006) 'Automatic Categorization of Fanatic Text Using random Forests,' Kuwait Journal of Science and Engineering, 33(2), pp. 1–18.
Kumar, S., Yadava, M. and Roy, P. P. (2019) 'Fusion of EEG response and sentiment analysis of products review to predict customer satisfaction,' Information Fusion, 52, pp. 41–52. DOI: 10.1016/j.inffus.2018.11.001.
Nanur, F. N., Halu, S. A. N. and Juita, E. (2020) ‘EVALUASI KETERSEDIAAN FASILITAS KESEHATAN YANG MEMADAI TERHADAP PENCAPAIAN REVOLUSI KIA DI MANGGARAI’, Jurnal Kebidanan, 12(01), pp. 80–92.
Ni, M. Y. et al. (2020) ‘Mental Health, Risk Factors, and Social Media Use During the COVID-19 Epidemic and Cordon Sanitaire Among the Community and Health Professionals in Wuhan, China: Cross-Sectional Survey’, JMIR Mental Health, 7(5), p. e19009. DOI: 10.2196/19009.
Osadchiy, V., Mills, J. N., and Eleswarapu, S. V. (2020) 'Understanding Patient Anxieties in the Social Media Era: Qualitative Analysis and Natural Language Processing of an Online Male Infertility Community,' Journal of Medical Internet Research, 22(3), p. e16728. DOI: 10.2196/16728.
Rabbimov, I. et al. (2020) 'Investigating the Effect of Emoji in Opinion Classification of Uzbek Movie Review Comments,' arXiv preprint arXiv:2008.00482.
Ragini, J. R., Anand, P. M. R. and Bhaskar, V. (2018) 'Big data analytics for disaster response and recovery through sentiment analysis,' International Journal of Information Management, 42, pp. 13–24. DOI: 10.1016/j.ijinfomgt.2018.05.004.
Saifullah, S. (2019) 'Fuzzy-AHP approach using Normalized Decision Matrix on Tourism Trend Ranking based-on Social Media,' Jurnal Informatika, 13(2), p. 16. DOI: 10.26555/jifo.v13i2.a15268.
Saputra, T. A. (2020) ‘BENTUK KECEMASAN DAN RESILIENSI MAHASISWA PASCASARJANA ACEH-YOGYAKARTA DALAM MENGHADAPI PANDEMI COVID-19’, JURNAL BIMBINGAN DAN KONSELING AR-RAHMAN, 6(1), pp. 55–61.
Schmidt, A., and Wiegand, M. (2017) 'A Survey on Hate Speech Detection using Natural Language Processing,' in Fifth International Workshop on Natural Language Processing for Social Media, pp. 1–10. DOI: 10.18653/v1/W17-1101.
Setiati, S. and Azwar, M. K. (2020) ‘COVID-19 and Indonesia’, Acta Medica Indonesiana, 52(1), pp. 84–89. Available at: https://www.scopus.com/record/display.uri?eid=2-s2.0-85083414691&origin=inward.
Silalahi, D. E. and Ginting, R. R. (2020) ‘Strategi Kebijakan Fiskal Pemerintah Indonesia Untuk Mengatur Penerimaan dan Pengeluaran Negara Dalam Menghadapi Pandemi Covid-19’, Jesya (Jurnal Ekonomi & Ekonomi Syariah), 3(2), pp. 156–167.
Somawati, A. V. et al. (2020) Bali vs COVID-19: Book Chapters. Nilacakra.
Srujan, K. S. et al. (2018) 'Classification of Amazon Book Reviews Based on Sentiment Analysis,' in, pp. 401–411. DOI: 10.1007/978-981-10-7512-4_40.
Thaha, A. F. (2020) ‘Dampak covid-19 terhadap UMKM di Indonesia’, BRAND Jurnal Ilmiah Manajemen Pemasaran, 2(1), pp. 147–153.
Del Vigna, F. et al. (2017) 'Hate me, hate me not: Hate speech detection on Facebook,' in First Italian Conference on Cybersecurity (ITASEC17), pp. 86–95. DOI: 10.1051/matecconf/201712502035.
Vo, B.-K. H. and Collier, N. (2013) 'Twitter Emotion Analysis in Earthquake Situations,' International Journal of Computational Linguistics and Applications, 4(1), pp. 159–173.
Warner, W. and Hirschberg, J. (2012) 'Detecting Hate Speech on the World Wide Web,' in Workshop on Language in Social Media (LSM 2012), pp. 19–26.
World Health Organisation (WHO) (2020a) Novel Coronavirus(2019-nCoV) Situation Report-22, 11 February 2020. Available at: https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200211-sitrep-22-ncov.pdf?sfvrsn=fb6d49b1_2.
World Health Organisation (WHO) (2020b) Novel Coronavirus (2019-nCoV), Situation Report-1, 21 January 2020. Available at: https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200121-sitrep-1-2019-ncov.pdf?sfvrsn=20a99c10_4.
Yadav, S. et al. (2018) 'Medical sentiment analysis using social media: towards building a patient assisted system,' in Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018).
Zhang, Y. and Haghani, A. (2015) 'A gradient boosting method to improve travel time prediction,' Transportation Research Part C: Emerging Technologies, 58, pp. 308–324. DOI: 10.1016/j.trc.2015.02.019.
Zhu, N. et al. (2020) ‘A Novel Coronavirus from Patients with Pneumonia in China, 2019’, New England Journal of Medicine, 382(8), pp. 727–733. DOI: 10.1056/NEJMoa2001017.
1. Author and Manuscript
- The author whose name appear in the above statement is declaring that he/she is the author of the stated manuscript
- All of the authors, if any, has already aware of this agreement and give consent for the mentioned name to act on their behalf
- The author stated that the manuscript is original and has never been published elsewhere.
- The author has obtained permissions from other contributors, if any
- The author has specifically mentioned and cited all external materials properly
- Note: External materials refers to any material, writings, figures, tables, illustrations, or any other materials which is not being produced, made, or patented by the author
- The author holds the sole responsible should there are any mistyping; unclear citation and holds responsible should there are any inappropriate manners such as unlawful, breaches, obscene, or any other reasons which are not aligned with the law and norm.
3. Deliverable of Manuscript and Other Materials
- The Contributor/Author shall deliver their manuscript using the provided and acceptable format (doc. Or docx) in the assigned date as well as author copyright document signed.
- Inability delivering the manuscript in the stated date and format will affecting the publication process thus, The Publisher have the rights to reject the manuscript and terminate the letter of acceptance or letter of offerings
- Author is responsible to deliver the manuscript using the provided format. Note: All of the figures, tables, illustrations, or any other materials that will be inserted in the manuscript need to be in high quality
- Should the author decided to alter the manuscript format, the fee will be charged and bore by authors
The Author give “Yayasan Sinergi Riset dan Edukasi” (here forth known as RSF Press) the unlimited right to publish the contribution identified above, without any restraints, in any form, at any time, directly or through others, to reproduce, transmit, archive, lease/lend, sell and distribute the contribution or parts thereof individually or together with other works in any language, revision and version (digital and hard), including reprints, translations, photographic reproductions, microform, audiograms, videograms, electronic form (offline, online), or any other reproductions of similar nature, including publication in the aforementioned book or any other book, as well as, the usage for advertising purposes. RSF Press will ensure that the Author’s name(s) is/are always clearly associated with the manuscript, and the publishers will not make any substantial change to the manuscript without consulting the author and ask for their consent. RSF Press is also entitled to carry out editorial changes in the contribution with the sole purpose of enhancing the overall organization and form of the contribution.
The Author retains the rights to publish the contribution in his/her own web site and thesis, in his/her employer’s web site and to publish a similar or revised version elsewhere, as long as it is clearly stated that the contribution was presented first RSF Press and the corresponding DOI is associated with the contribution.