Text analysis of Social Media comments for assessing the social well-being of metropolitan residents (St. Petersburg's example)

A. V. Chizhik

Abstract


The paper is devoted to a description of experiments related to an attempt to build a methodology for analyzing text data from social networks with the aim of subsequently assessing the social well-being of city residents. The main task was to identify the most optimal vectorization model for short texts (comments on posts) for further use in sentiment analysis. The article presents the results of a comparison of three currently relevant approaches to creating vector embeddings: taking into account the weight of a word in a document (TF-IDF), using distributional semantics when creating word vectors (Word2Vec) and language-agnostic sentence embeddings (Laser). The article describes the design of the study, provides quality metrics, and describes the data on which the experiments were conducted. The following are intermediate results of a subsequent study of text data within the framework of the analysis of social well-being: topic modeling is applied to the texts, each topic is measured on a five-point scale of emotions. For the experiments, data from the social network Vkontakte was used.

Full Text:

PDF (Russian)

References


Loukachevitch N., Levchik A. Creating a general Russian sentiment lexicon // Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16). 2016. P. 1171-1176.

Koltsova O.Y., Alexeeva S., Kolcov S. An opinion word lexicon and a training dataset for Russian sentiment analysis of social media // Computational Linguistics and Intellectual Technologies: Materials of DIALOGUE. 2016. Vol. 2016. P. 277-287.

Cambria E., Poria S., Bajpai R., Schuller B. SenticNet 4: A semantic resource for sentiment analysis based on conceptual primitives // Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics. 2016. P. 2666-2677.

Baccianella S. et al. Sentiwordnet 3.0: an enhanced lexical resource for sentiment analysis and opinion mining // Lrec. 2010. Vol. 10. No. 2010. P. 2200-2204.

Gatti L., Guerini M., Turchi M. SentiWords: Deriving a high precision and high coverage lexicon for sentiment analysis // IEEE Transactions on Affective Computing. 2015. Vol. 7. No. 4. P. 409-421.

Baziotis C. et al. Ntua-slp at semeval-2018 task 3: Tracking ironic tweets using ensembles of word and character level attentive rnns // arXiv preprint. 2018. arXiv:1804.06659.

Baziotis C., Pelekis N., Doulkeridis C. Datastories at semeval-2017 task 4: Deep lstm with attention for message-level and topic-based sentiment analysis // Proceedings of the 11th international workshop on semantic evaluation (SemEval-2017). 2017. P. 747-754.

Meškelė D., Frasincar F. ALDONAr: A hybrid solution for sentence-level aspect-based sentiment analysis using a lexicalized domain ontology and a regularized neural attention model // Information Processing & Management. 2020. Vol. 57. No. 3. P. 102211.

Mikolov T., Sutskever I., Chen K., Corrado G., Dean J. Distributed representations of words and phrases and their compositionality // Advances in neural information processing systems. 2013. Vol. 26. P. 3111-3119.

Pennington J., Socher R., Manning C. D. Glove: Global vectors for word representation // Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 2014. P. 1532-1543.

Joulin A., Grave E., Bojanowski P., Mikolov T. Bag of tricks for efficient text classification // Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics. 2017. Vol. 2. P. 427-431.

Lee K., Filannino M., Uzuner Ö. An Empirical Test of GRUs and Deep Contextualized Word Representations on De-Identification // MedInfo. 2019. P. 218-222.

Devlin J., Chang M.-W., Lee K., Toutanova K. Bert: Pre-training of deep bidirectional transformers for language understanding // Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2018. Vol. 1. P. 4171-4186.

Chizhik A., Zherebtsova Y. Challenges of Building an Intelligent Chatbot // IMS. 2020. P. 277-287.

Sboev A., Naumov A., Rybka R. Data-driven model for emotion detection in Russian texts //Procedia Computer Science. 2021. Vol. 190. P. 637-642


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность MoNeTec 2024

ISSN: 2307-8162