Explainability method of BERT transformer for solving text classification problem

Pavel Nikolaev

Abstract


Neural networks, like other machine learning methods, are black boxes. This means it's impossible to understand how they make decisions. However, as neural networks become increasingly complex, developing methods for their explainability becomes increasingly important. This paper proposes a new method for explaining transformers (a type of deep neural network) designed for classifying Russian-language text data. The article examines the BERT transformer, retrained to classify books by genre based on their annotations. The retrained model demonstrates high accuracy when validated on a test set – 84%. Furthermore, the paper presents the explainability method for the BERT model based on clustering its attention heads using the HDBSCAN method. Cluster analysis enabled us to classify the attention heads into distinct groups, which can be used to analyze the model's performance.


Full Text:

PDF (Russian)

References


Devlin J. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding / J. Devlin, M. Chang, K. Lee, K. Toutanova // 2019. – URL: https://arxiv.org/abs/1810.04805 (date of access: 20.10.2025).

Radford A. Improving Language Understanding by Generative Pre-Training / A. Radford, K. Narasimhan, T. Salimans, I. Sutskever // 2018. – URL: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf (date of access: 20.10.2025).

Radford A. Language Models are Unsupervised Multitask Learners / A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, I. Sutskever // 2019. – URL: https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf (date of access: 20.10.2025).

Brown T.B. et al. Language Models are Few-Shot Learners // 2020. – URL: https://arxiv.org/abs/2005.14165 (date of access: 20.10.2025).

Lewis M. BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension / M. Lewis, Y. Liu, N. Goyal, M. Ghazvininejad, A. Mohamed, O. Levy, V. Stoyanov, L. Zettlemoyer // 2019. – URL: https://arxiv.org/abs/1910.13461 (date of access: 20.10.2025).

Raffel C. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer / C. Raffel, N. Shazeer, A. Roberts, K. Lee, S.Narang, M. Matena, Y. Zhou, W. Li, P.J. Liu // 2023. – URL: https://arxiv.org/abs/1910.10683 (date of access: 20.10.2025).

Ribeiro M.T. "Why Should I Trust You?": Explaining the Predictions of Any Classifier / M.T. Ribeiro, S. Singh, C. Guestrin // 2016. – URL: https://arxiv.org/abs/1602.04938 (date of access: 23.10.2025).

Sundararajan M. Axiomatic attribution for deep networks / M. Sundararajan, A. Taly, Q. Yan // 2017. – URL: https://arxiv.org/abs/1703.01365 (date of access: 20.10.2025).

Lundberg S. A Unified Approach to Interpreting Model Predictions / S. Lundberg, S. Lee // 2017. – URL: https://arxiv.org/abs/1705.07874 (date of access: 20.10.2025).

Selvaraju R.R. Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization / R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra // 2019. – URL: https://arxiv.org/abs/1610.02391 (date of access: 20.10.2025).

LIME и SHAP [Electronic resource]. – URL: https://habr.com/ru/companies/otus/articles/779430 (date of access: 20.10.2025).

Vaswani A. Attention is all you need / A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, I. Polosukhin // 2017. – URL: https://arxiv.org/pdf/1706.03762.pdf (date of access: 23.10.2025).

BERT in DeepPavlov [Electronic resource]. – URL: https://deeppavlov-docs.readthedocs.io/en/latest/features/models/bert.html (date of access: 23.10.2025).

Nikolaev P.L. Klassifikaciya knig po zhanram na osnove tekstovyh opisanij posredstvom glubokogo obucheniya / P.L. Nikolaev // International Journal of Open Information Technologies. – 2022. – №1. – P. 36–40. (in Russ.)


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность Monetec 2026 СНЭ

ISSN: 2307-8162