LLM-Based Intent Classification in Corporate Email Communications
Abstract
Incoming replies to corporate email campaigns must be triaged: does the customer intend to continue the conversation? We compare three approaches on live traffic of a production platform (1,300+ accounts, 17 languages): a keyword classifier, an LLM with few-shot examples and structured output (tool use), and a fine-tuned multilingual encoder (XLM-RoBERTa). Ground truth is the operator’s action; the test set contains 300 genuine customer replies, filtered from the campaign’s own relay transit. Quoted text of the original message is present in 96% of replies and destroys the naive keyword baseline: on raw bodies it is statistically indistinguishable from the trivial respond-to-all strategy (F1 0.676 vs. 0.667). The LLM pipeline reaches F1 = 0.73; the encoder fine-tuned on 11k operator decisions reaches F1 = 0.78, significantly outperforming the LLM (McNemar p < 0.01) with nearly half the false positives and zero per-call cost. Both the LLM and the encoder are insensitive to quoted text (p ≥ 0.75); quote stripping remains useful only for the keyword fallback and prompt economy. The pipeline runs in production; validity of operator-action labels and cost trade-offs are discussed
Full Text:
PDF (Russian)References
Y. Saito, B. N. Bershad, and H. M. Levy, “Manageability, availability and performance in Porcupine: A highly scalable, cluster-based mail service,” in Proc. 17th ACM Symp. on Operating Systems Principles (SOSP), 1999, pp. 1–15.
S. K. Dam, C. S. Hong, Y. Qiao, and C. Zhang, “A complete survey on LLM-based AI chatbots,” arXiv:2406.16937, 2024.
M. AlShaikh, Y. Alrajeh, S. Alamri, S. Melhem, and A. Abu-Khadrah, “Supervised methods of machine learning for email classification: A literature survey,” Systems Science & Control Engineering, vol. 13, no. 1, 2025.
A. Lampert, R. Dale, and C. Paris, “Segmenting email message text into zones,” in Proc. Conf. on Empirical Methods in Natural Language Processing (EMNLP), Singapore, 2009, pp. 919–928.
R. Novelo, R. Rocha Silva, and J. Bernardino, “A literature review of personalized large language models for email generation and automation,” Future Internet, vol. 17, no. 12, art. 536, 2025.
A. Pichugov, D. Namiot, and E. Zubareva, “Modern methods for training large language models with minimal data: From one example to absolute zero – an academic review,” Int. J. Open Inf. Technol., vol. 13, no. 6, pp. 114–124, 2025.
P. L. Nikolaev, “Explainability method of BERT transformer for solving text classification problem,” Int. J. Open Inf. Technol., vol. 14, no. 3, pp. 43–47, 2026.
D. Namiot and E. Ilyushin, “On architecture of LLM agents,” Int. J. Open Inf. Technol., vol. 13, no. 1, pp. 67–74, 2025.
Y. Kuratov and M. Arkhipov, “Adaptation of deep bidirectional multilingual transformers for Russian language,” in Computational Linguistics and Intellectual Technologies: Proc. Int. Conf. “Dialogue 2019”, Moscow, 2019, vol. 18, pp. 333–339.
Y. Chae and T. Davidson, “Large language models for text classification: From zero-shot learning to instruction-tuning,” Sociological Methods & Research, 2025.
M. Shay, R. Davidson, and N. Grinberg, “EnronSR: A benchmark for evaluating AI-generated email replies,” in Proc. Int. AAAI Conf. on Web and Social Media (ICWSM), vol. 18, pp. 2063–2075, 2024.
T. Repke and R. Krestel, “Bringing back structure to free text email conversations with recurrent neural networks,” in Advances in Information Retrieval (ECIR 2018), LNCS, vol. 10772, pp. 114–126.
B. Jardim, R. Rei, and M. S. C. Almeida, “Multilingual email zoning,” in Proc. EACL Student Research Workshop, 2021, pp. 88–95.
R. Melendez, M. Ptaszynski, and F. Masui, “Comparative investigation of traditional machine-learning models and transformer models for phishing email detection,” Electronics, vol. 13, no. 24, art. 4877, 2024.
M. Jbene, A. Chehri, R. Saadane, and S. Tigani, “Intent detection for task-oriented conversational agents: A comparative study of recurrent neural networks and transformer models,” Expert Systems, vol. 42, no. 2, art. e13712, 2025.
J. Wulf and J. Meierhofer, “Exploring the potential of large language models for automation in technical customer service,” in Proc. Spring Servitization Conf. (SSC 2024), arXiv:2405.09161.
Y. Guo et al., “ESIE-BERT: Enriching sub-words information explicitly with BERT for intent classification and slot filling,” Neurocomputing, vol. 591, art. 127725, 2024.
D. A. Karpov and V. P. Konovalov, “Encoder-agnostic transformer models: Knowledge transfer for conversational tasks for the Russian language,” Rechevye Tekhnologii, no. 2, pp. 64–77, 2023.
R. R. Isaev and E. A. Ilyushin, “Development of method for self-correction of large language models via reinforcement learning,” Int. J. Open Inf. Technol., vol. 13, no. 6, pp. 1–9, 2025.
I. A. Kosyanenko and R. G. Bolbakov, “Dataset collection for automatic generation of commit messages,” Russian Technological Journal, vol. 13, no. 2, pp. 7–17, 2025.
A. Conneau, K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov, “Unsupervised cross-lingual representation learning at scale,” in Proc. 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020, pp. 8440–8451.
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность Monetec 2026 СНЭ
ISSN: 2307-8162