Russian text corpora  for deception detection studies

T. A. Litvinova; O. V. Zagorovskaya; O. A. Litvinova

Russian text corpora for deception detection studies

T. A. Litvinova, O. V. Zagorovskaya, O. A. Litvinova

Abstract

Text-based deception detection is presently on the way to gain even more significance as related studies certainly have both theoretical and practical value and a range of applications for police, security, and customs, as well as predatory communications, e.g. Internet scams). For these studies designing text corpora is essential. Text-based deception detection has been mostly dealt with using English as well as a few other European languages. There is not sufficient research into the problem with the use of Slavic languages, which is mostly due to no corresponding corpora available. In this article we propose an overview of existing text corpora employed in studies of text-based deception detection as well as a detailed description of available Russian corpora specially designed for text-based deception detection.

Full Text:

PDF

References

Fuller, Ch. M., Biros, D. P., Dursun, D. 2008. Exploration of Feature Selection and Advanced Classification Models for High-Stakes Deception Detection. In Proceedings of the 41st Annual Hawaii International Conference on System Sciences, IEEE. Waikoloa, HI, USA. DOI= 10.1109/HICSS.2008.158.

Hirschberg, J., Benus, S., Brenier, J., Enos, F., Friedman, S., Gilman, S., Gir, C., Graciarena, G., Kathol, A., Michaelis, L. 2005. Distinguishing deceptive from non-deceptive speech. In Proceedings of Interspeech 2005, Lisbon, Portugal, ACM, 1833–1836.

Mihalcea, R., Strapparava, C. 2009. The Lie Detector: Explorations in the Automatic Recognition of Deceptive Language. In Proceedings of the Association for Computational Linguistics (ACL-IJCNLP 2009), ACM, 309-312.

Zhang, H., Wei, S., Tan, H., Zheng, J. 2009. Deception detection based on SVM for Chinese text in CMC. In Proceedings of Sixth International Conference on Information Technology: New Generations (ITNG ’09) (Las Vegas, NV, USA, April 27-29, 2009), IEEE, 481–486. DOI=10.1109/ITNG.2009.66.

Zhou, L., Burgoon, J., Nunamaker, J., Twitchell, D. 2004. Automating linguistics-based cues for detecting deception in text-based asynchronous computer-mediated communications. Group Decision & Negotiation, 13(1), 81–106. DOI=10.1023/B:GRUP.0000011944.62889.6f

Enos, F. 2009. Detecting Deception in Speech. Doctoral thesis. Publication Number: 3348430. Columbia University.

Juola, P.: Detecting stylistic deception. In: Proceedings of the Workshop on Computational Approaches to Deception Detection, Avignon, pp. 91–96 (2012)

Almela, A., Valencia-Garc´ıa, R., Cantos, P. 2012. Seeing through deception: A computational approach to deceit detection in written communication. In Proceedings of the Workshop on Computational Approaches to Deception Detection. ACM. Avignon, France, 15–22.

Fornaciari, T., Poesio, M. 2013. Automatic deception detection in Italian court cases. Artificial Intelligence and Law, 21(3), 303–340. DOI=10.1007/s10506-013-9140-4.

Litvinova, O., Litvinova, T. Seredin, P., Lyell, J. 2017. Deception Detection in Russian Texts. In Proceedings of the Student Research Workshop at the 15th Conference of the European Chapter of the Association for Computational Linguistics (Valencia, Spain, April 3-7 2017), ACM. 43–52.

Litvinova, T., Litvinova, O. 2016. Russian Deception Bank: A Corpus for Automated Deception Detection in Text. In Proceedings of CBBLR 2016, Tribun EU, 1–7.

Newman, M., Pennebaker, J., Berry, D., Richards, J. 2003. Lying words: Predicting deception from linguistic style. Personality and Social Psychology Bulletin, 29(5), 665-675. DOI=10.1177/0146167203029005010.

Rubin, V. L., Conroy, N. J. 2012. The art of creating an informative data collection for automated deception detection: A corpus of truths and lies. In Proc. Am. Soc. Info. Sci. Tech., 49, John Wiley & Sons, Inc., 1–11. DOI= 10.1002/meet.14504901045.

Perez-Rosas, V., Mihalcea, R. 2015. Experiments in Open Domain Deception Detection. In Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2015) (Lisbon, Portugal, September 17-21, 2015), ACM, 1120–1125.

Perez-Rosas, V., Mihalcea, R. 2014. Cross-cultural Deception Detection. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (ACL 2014), (Baltimore, Maryland, USA, June 23-25, 2014). ACM, 440–445.

Verhoeven, B., Daelemans, W. 2014. CLiPS Stylometry Investigation (CSI) corpus: a Dutch corpus for the detection of age, gender, personality, sentiment and deception in text. In Proceedings of the 9th International Conference on Language Resources and Evaluation (Reykjavik, Iceland, May 2014).

Perez-Rosas, V., Mihalcea, R., Narvaez, A., Burzo, M. 2014. A multimodal dataset for deception detection. In Proceedings of the Conference on Language Resources and Evaluations (LREC 2014) (Reykjavik, Iceland, May 2014).

Fitzpatrick, E., & Bachenko, J. 2012. Building a data collection for deception research. In Proceedings of the 13th Conference of the European Chapter for the Association for Computational Linguistics: Computational Approached to Deception Detection Workshop (EACL 2012), ACM, Avignon, France, 31-38.

Fitzpatrick, E., Bachenko, J. 2009. Building a Forensic Corpus to Test Language-based Indicators of Deception. Corpus Linguistics. In Corpus-linguistic applications Current studies, new directions. Edited by Stefan Th. Gries Stefanie Wulff Mark Davies, Series in Language and Computers. Rodopi, 183-196.

Perez-Rosas, V., Abouelenien, M., Mihalcea, R., Burzo, M. 2015. Deception Detection using Real-life Trial Data. In Proceedings of the ACM International Conference on Multimodal Interaction (ICMI 2015) (Seattle, Washington, USA — November 09-13, 2015), ACM, 59-66. DOI=10.1145/2818346.2820758.

Larcker, D. F., Zakolyukina, A. A. 2010. Detecting deceptive discussions in conference calls. Journal of Accounting Research, 50: 495–540. DOI=10.1111/j.1475-679X.2012.00450.x.

Koper, R. J., Sahlman, J. M. 1991. The behavioral correlates of real-world deceptive communication Paper presented at the Annual Meeting of the International Communication Association (41st, Chicago, IL, May 23-27, 1991). Distributed by ERIC Clearinghouse, 1991.

Michael Brennan and Rachel Greenstadt. 2009. Practical attacks against authorship recognition techniques. In Proceedings of the Twenty-First Conference on Innovative Applications of Artificial Intelligence (IAAI), Pasadena, CA.

Sadia Afroz, Michael Brennan, and Rachel Greenstadt. 2012. Detecting hoaxes, frauds, and deception in writing style online. In Proceedings of the 33rd conference on IEEE Symposium on Security and Privacy, pages=To appear. IEEE

Refbacks

There are currently no refbacks.

Abava Кибербезопасность MoNeTec 2024

ISSN: 2307-8162

International Journal of Open Information Technologies