Development of an Intelligent Method Based on Machine Learning for Classification of Vulnerabilities and Threats in Information Systems

Rimma Gorokhova, Petr Nikitin

Abstract


This study considers a method for classifying vulnerabilities and cyber threats using data from information and news resources. The authors propose to apply text analysis and machine learning methods to automate the detection and classification of threats in information systems. Various approaches are explored, including latent semantic analysis, topic modeling, n-gram analysis, and vector representation of text, which allows identifying semantic relationships and thematic structures. These methods help to adapt approaches to the requirements for interpretability and accuracy of cyber threat data. The paper considers an approach to analyzing the co-occurrence of words and phrases in texts to identify thematic clusters of cyber threats. The main analysis tool is the calculation of the association measure between terms, which allows for a quantitative assessment of their relationships and contributes to the construction of more effective classification models using machine learning methods. To visualize the results, it is proposed to use strategic diagrams based on two key indicators: centrality and cluster density. Clusters are classified into four categories depending on the values of these indicators, which allows for a more accurate characterization of their essence and relationships with other threats. To analyze the dynamics of changes in clusters, the method of directed graphs is used, which allows tracking the transformation of cyber threat components from one time period to another. Based on the presented approach, it is possible to create more informative cyber threat monitoring systems, which ultimately contributes to increasing the level of protection of information systems from potential attacks. Thus, the study opens up new horizons for automating the analysis and classification of threats, providing more effective solutions in the field of cybersecurity.

Full Text:

PDF (Russian)

References


Meshcheryakov, I. V. The role of the media in the state information policy to ensure national security / I. V. Meshcheryakov // Science of Krasnoyarsk. - 2016. - Vol. 5, No. 6. - P. 70-82. - DOI 10.12731 / 2070-7568-2016-6-70-82. - EDN XHGMBZ.

Golovin, Yu. A. Increasing role of the media in ensuring information security / Yu. A. Golovin, A. N. Orlov // Knowledge. Understanding. Skill. - 2013. - No. 2. - P. 147-152. - EDN QYUIDN.

Burova, Yu. V. Potential of regional media in countering terrorism (on the example of the Republic of Mordovia) / Yu. V. Burova // Age of information. – 2017. – No. 2-1. – P. 180-181. – EDN YGRTCL.

Plotkina, P. V. International problems of counteracting fake news as a tool of information wars / P. V. Plotkina, E. K. Tokhchukova // Theories and problems of political research. – 2023. – Vol. 12, No. 5A-6A. – P. 54-62. – DOI 10.34670/AR.2023.94.56.012. – EDN IOCCEZ.

Obidov, O. S. Current state and prospects for the development of media in ensuring information security in modern international relations / O. S. Obidov // News of the Academy of Sciences of the Republic of Tajikistan. Department of Social Sciences. – 2019. – No. 1 (254). – P. 57-60. – EDN ELFOXT.

Zaporozhtseva, V. M. Parsing websites as a method of collecting data for linguistic research / V. M. Zaporozhtseva // Young scientist. – 2024. – No. 24(523). – P. 496-499. – EDN QCMGMO.

Methods for assessing the security of computer systems for information support of the digital economy / A. A. Grusho, N. A. Grusho, M. I. Zabezhailo, E. E. Timonina // International Journal of Open Information Technologies. – 2019. – Vol. 7, No. 4. – P. 61-66. – EDN ZCGPAD.

Eyzenakh, D. S. High performance distributed web-scraper / D. S. Eyzenakh, A. S. Rameykov, I. V. Nikiforov // Proceedings of the Institute for System Programming of the RAS. – 2021. – Vol. 33, No. 3. – P. 87-100. – DOI 10.15514/ISPRAS-2021-33(3)-7. – EDN SIPWXY.

Bogatenko, T. R. Application of machine learning and statistics to anesthesia detection from EEG data / T. R. Bogatenko, K. S. Sergeev, G. I. Strelkova // Izvestiya of Saratov University. New Series. Series: Physics. – 2024. – Vol. 24, No. 3. – P. 209-215. – DOI 10.18500/1817-3020-2024-24-3-209-215. – EDN HKYBMM.

Analysis of Machine Learning Models by Solving the Text Data Classification Problem / A. V. Pchelin, N. A. Kononov, V. S. Serova [et al.] // Journal of Computational and Engineering Mathematics. – 2021. – Vol. 8, No. 2. – P. 33-45. – DOI 10.14529/jcem210203. – EDN VREXYT.

Darinskaya L. A. Bibliometric analysis as a way of entering the research problem / L. A. Darinskaya, A. S. Guslina // Bulletin of St. Petersburg University. – 2010. – No. 3. – P. 71-79.

Text classification using various machine learning methods to detect signs of cyber aggression / Yu. E. Gapanyuk, D. A. Popova, K. R. Rabtsevich [et al.] // Natural and technical sciences. - 2022. - No. 6 (169). - P. 277-281. - EDN OHHHNM.

Khotin, D. Yu. Identification of fake fragments in text news messages using machine learning / D. Yu. Khotin // Bulletin of science. - 2024. - Vol. 1, No. 6 (75). - P. 1568-1576. - EDN MHTVUE.

Gusev, P. Yu. Development of a text classification system by scientific specialties using machine learning methods / P. Yu. Gusev // Bulletin of the Novosibirsk State University. Series: Information Technology. – 2021. – V. 19, No. 1. – P. 39-47. – DOI 10.25205/1818-7900-2021-19-1-39-47. – EDN DWLTXW.

Dronov, S. V. Optimization of cluster partitions using latent class analysis technique / S. V. Dronov // Bulletin of Altai State University. – 2023. – No. 1(129). – P. 89-94. – DOI 10.14258/izvasu(2023)1-14. – EDN ZIMQKI.

lmaev, N. A. Thematic analysis of discussions using the Latent Dirichlet Allocation method / N. A. Almaev, O. V. Murasheva // Institute of Psychology of the Russian Academy of Sciences. Social and Economic Psychology. - 2022. - Vol. 7, No. 1 (25). - P. 47-69. - DOI 10.38098 / ipran.sep_2022_25_1_03. - EDN YECKGR.

Zyryanov, M. S. The influence of the length of n-grams in word-by-word tokenization on the efficiency of identifying extremist texts using machine learning methods / M. S. Zyryanov // Scientific aspect. - 2024. - Vol. 22, No. 5. - P. 2949-2961. - EDN KVHJHN.

Kechik, D. A. Method of estimation of frequency variation relying on estimation of shift of spectral peaks / D. A. Kechik, Yu. P. Aslamov, I. G. Davydov // System analysis and applied informatics. – 2021. – No. 1. – P. 53-61. – DOI 10.21122/2309-4923-2021-1-53-61. – EDN CDEBVV.

Grefenstette, G. Competing Views of Word Meaning: Word Embeddings and Word Senses / G. Grefenstette, P. Hanks // International Journal of Lexicography. – 2023. – Vol. 36, No. 2. – P. 211-219. – DOI 10.1093/ijl/ecad005. – EDN NQHRQL.

Panamareva, O. N. Implementation of clustering of news flows based on vector representations of text / O. N. Panamareva, V. V. Luka, D. A. Sukharev // Bulletin of Tula State University. Technical sciences. – 2024. – No. 7. – P. 304-309. – DOI 10.24412/2071-6168-2024-7-304-305. – EDN VPTOZH.

Sandhu, T. Exploration of Word Embeddings with Graph-Based Context Adaptation for Enhanced Word Vectors / T. Sandhu, Z. Kobti // The International FLAIRS Conference Proceedings. – 2024. – Vol. 37. – DOI 10.32473/flairs.37.1.135597. – EDN PWIKWF.

Kandilas, V., Upham, S. P., Ungar, L. H. Knowledge Community Analysis Using Foreground and Background Clusters. [Electronic resource]. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.146.3141&rep= rep1&type=pdf (accessed: 02.10.2024).

Nguyen, T.T. et al. Multi-target deep reinforcement learning system // Engineering Applications of Artificial Intelligence, Vol. 96, November 2020, 103915. [Electronic resource]. - https://doi.org/10.1016/j.engappai.2020.103915 (accessed: 02.10.2024).

Palmov, S.V., Artyushkina, E.S. Deep learning: definition and distinctive features. // Forum of young scientists. 2020. No. 3 (43). P. 311-316.

Chistova, E.V., Shelmanov, A.O., Smirnov, I.V. Application of deep learning to modeling dialogue in natural language. // Proceedings of the Institute for Systems Analysis of the Russian Academy of Sciences. 2019. Vol. 69. No. 1. P. 105-115.

Potemkin, A.V. Processing heterogeneous information using deep

learning of neural networks. // Soft measurements and calculations. 2019. No. 9 (22). P. 44-48.

Small, H. Tracking and forecasting growth areas in science [Electronic resource]. http://www.scimaps.org/exhibit/docs/small.pdf (accessed: 01.10.2024).


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность ИБП для ЦОД СНЭ

ISSN: 2307-8162