Countering Prompt Injection attacks on large language models

Ramina Mudarova, Dmitry Namiot

Abstract


Machine learning models have brought with them a new class of cyber attacks: adversarial attacks. Large language models are no exception and are also susceptible to attacks. Such attacks are becoming increasingly dangerous in the context of the use of deep learning and artificial intelligence in various fields. In the world of modern computing and artificial intelligence, security plays a key role and more and more attention is being paid to countering such attacks. Attacks on large language models include, but are not limited to, runtime attacks known as Prompt Injection. These attacks aim to disrupt large language models by injecting malicious instructions or prompts to corrupt the model's output, which can have serious consequences for the confidentiality and integrity of information. Technically, they turn out to be one of the easiest for attackers to execute. In this regard, there is a need to research and develop effective strategies to counter Prompt Injection. This article is devoted to the research and development of effective algorithms and methodologies capable of detecting and blocking Prompt Injection attacks in order to improve system security and protection from malicious influences. The key goal of the work is to implement these methods in the form of software solutions, as well as evaluate their effectiveness through experiments using various metrics on test data. The scientific novelty of this development lies in the creation of unique protection mechanisms that can ensure reliable security of language models from Prompt Injection attacks.


Full Text:

PDF (Russian)

References


Liu, Yupei, et al. "Prompt injection attacks and defenses in llm-integrated applications." arXiv preprint arXiv:2310.12815 (2023).

Shreya Goyal, Sumanth Doddapaneni, Mitesh M. Khapra, and Balaraman Ravindran . 2023. A Survey of Adversarial Defences and Robustness in NLP. 1, 1 (April 2023), 43 pages. https://arxiv.org/pdf/2203.06414.pdf.

Farzad Nourmohammadzadeh Motlagh, Mehrdad Hajizadeh, Mehryar Majd, Pejman Najafi, Feng Cheng, and Christoph Meinel. Безопасность систем машинного обучения. Large Language Models in Cybersecurity: State-of-the-Art //arXiv e-prints. – 2024. – P. arXiv:2402.00891.

Reshabh K Sharma, Vinayak Gupta, and Dan Grossman. SPML: A DSL for Defending Language Models Against Prompt Attacks School of Computer Science & Engineering //arXiv e-prints. – 2024. – P. arXiv:2402.11755.

Mittal A. Vulnerabilities and security threats faced by large language models //Artificial Intelligence. unite.ai - 2024.

Jingwei Yi, Yueqi Xie, Bin Zhu, Emre Kıcıman, Guangzhong Sun, Xing Xie, and Fangzhao Wu. Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models //arXiv e-prints. – 2024. – P. arXiv:2312.14197.

Shayegani, Erfan, et al. "Survey of vulnerabilities in large language models revealed by adversarial attacks." arXiv preprint arXiv:2310.10844 (2023).

Suhomlin V. A. i dr. Model' navykov kiberbezopasnosti 2020 //Sovremennye informacionnye tehnologii i IT-obrazovanie. – 2020. – T. 16. – #. 3. – S. 695-710.telekommunikacij v nauke i obrazovanii. – 2021. – S. 452-456.

Chen X. et al. Knowprompt: Knowledge-aware prompt-tuning with synergistic optimization for relation extraction //Proceedings of the ACM Web conference 2022. – 2022. – S. 2778-2788.

Greshake K. et al. More than you've asked for: A Comprehensive Analysis of Novel Prompt Injection Threats to Application-Integrated Large Language Models //arXiv e-prints. – 2023. – S. arXiv: 2302.12173.

Hertz A. et al. Prompt-to-prompt image editing with cross attention control //arXiv preprint arXiv:2208.01626. – 2022.

KV S., Manjunath T. C. Implementation of Authorization and Authentication techniques in IoT objects for Industrial Applications //International Neurourology Journal. – 2023. – T. 27. – #. 4. – S. 500-509.

Martin E. B., Ghosh S. GitHub Copilot: A Threat to High School Security? Exploring GitHub Copilot's Proficiency in Generating Malware from Simple User Prompts //2023 International Conference on Emerging Trends in Networks and Computer Communications (ETNCC). – IEEE, 2023. – S. 1-6.

Martínez Torres J., Iglesias Comesaña C., García-Nieto P. J. Machine learning techniques applied to cybersecurity //International Journal of Machine Learning and Cybernetics. – 2019. – T. 10. – S. 2823-2836.

Martino A., Iannelli M., Truong C. Knowledge injection to counter large language model (LLM) hallucination //European Semantic Web Conference. – Cham : Springer Nature Switzerland, 2023. – S. 182-185.

Sarker I. H. Deep cybersecurity: a comprehensive overview from neural network and deep learning perspective //SN Computer Science. – 2021. – T. 2. – #. 3. – S. 154.

Shulha O. et al. Banking information resource cybersecurity system modeling //Journal of Open Innovation: Technology, Market, and Complexity. – 2022. – T. 8. – #. 2. – S. 80.

Yan J. et al. Backdooring instruction-tuned large language models with virtual prompt injection //NeurIPS 2023 Workshop on Backdoors in Deep Learning-The Good, the Bad, and the Ugly. – 2023.

Ye H. et al. Ontology-enhanced Prompt-tuning for Few-shot Learning //Proceedings of the ACM Web Conference 2022. – 2022. – S. 778-787.

Zhuang L., Fei H., Hu P. Knowledge-enhanced event relation extraction via event ontology prompt //Information Fusion. – 2023. – T. 100. – S. 101919.

Tom B Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners. In NeurIPS, 2020.

Introducing ChatGPT. https://openai.com/blog/chatgpt, Retrieved: Mar, 2024.

OpenAI. Gpt-4 technical report. arXiv preprint arXiv:2303.08774, 2023.

Rohan Anil, Andrew M. Dai, Orhan Firat, Melvin Johnson, Dmitry Lepikhin, Alexandre Passos, et al. Palm 2 technical report. arXiv preprint arXiv:2305.10403, 2023.

Bing Search. https://www.bing.com/, Retrieved: Mar, 2024.

ChatGPT Plugins. https://openai.com/blog/chatgpt-plugins, 2023.

ChatWithPDF. https://gptstore.ai/plugins/chatwithpdf-sdan-io, 2023.

Sundar Pichai. An important next step on our AI journey. https://blog.google/technology/ai/bard-google-ai-search-updates/, Retrieved: Mar, 2024.

Tiago A. Almeida, Jose Maria Gomez Hidalgo, and Akebo Yamakami. Contributions to the study of sms spam filtering: New collection and results. In Proceedings of the 2011 ACM Symposium on Document Engineering (DOCENG’11), 2011.

Rich Harang. Securing LLM Systems Against Prompt Injection. https://developer.nvidia.com/blog/securing-llm-systemsagainst-prompt-injection, 2023.

Yi Liu, Gelei Deng, Yuekang Li, Kailong Wang, Tianwei Zhang, Yepang Liu, Haoyu Wang, Yan Zheng, and Yang Liu. Prompt injection attack against llm-integrated applications. arXiv preprint arXiv:2306.05499, 2023.

Fábio Perez and Ian Ribeiro. Ignore previous prompt: Attack techniques for language models. In NeurIPS ML Safety Workshop, 2022.

Jose Selvi. Exploring Prompt Injection Attacks. https://research.nccgroup.com/2022/12/05/exploring-prompt-injection-attacks/, 2022.

Simon Willison. Prompt injection attacks against GPT-3. https://simonwillison.net/2022/Sep/12/prompt-injection/, Retrieved: Mar, 2024.

Simon Willison. Delimiters won’t save you from prompt injection. https://simonwillison.net/2023/May/11/delimiters-wont-save-you, Retrieved: Mar, 2024.

Hezekiah J. Branch, Jonathan Rodriguez Cefalu, Jeremy McHugh, Leyla Hujer, Aditya Bahl, Daniel del Castillo Iglesias,Ron Heichman, and Ramesh Darwishi. Evaluating the susceptibility of pre-trained language models via handcrafted adversarial examples. arXiv preprint arXiv:2209.02128, 2022.

Davey Winder. Hacker Reveals Microsoft’s New AI-Powered Bing Chat Search Secrets. https://www.forbes.com/sites/daveywinder/2023/02/13/hacker-reveals-microsofts-new-ai-powered-bing-chat-search-secrets/?sh=356646821290, Retrieved: Mar, 2024.

Namiot, D. E. Osnovanija dlja rabot po ustojchivomu mashinnomu obucheniju / D. E. Namiot, E. A. Il'jushin, I. V. Chizhov // International Journal of Open Information Technologies. – 2021. – T. 9, # 11. – S. 68-74. – EDN BAFFGK.

Cifrovaja jekonomika i Internet Veshhej - preodolenie silosa dannyh / V. P. Kuprijanovskij, A. R. Ishmuratov, D. E. Namiot [i dr.] // International Journal of Open Information Technologies. – 2016. – T. 4, # 8. – S. 36-42. – EDN WFVAPB.

Iskusstvennyj intellekt kak strategicheskij instrument jekonomicheskogo razvitija strany i sovershenstvovanija ee gosudarstvennogo upravlenija. Chast' 2. Perspektivy primenenija iskusstvennogo intellekta v Rossii dlja gosudarstvennogo upravlenija / I. A. Sokolov, V. I. Drozhzhinov, A. N. Rajkov [i dr.] // International Journal of Open Information Technologies. – 2017. – T. 5, # 9. – S. 76-101. – EDN ZEQDMT.

Greshake, Kai, et al. "Not what you've signed up for: Compromising real-world llm-integrated applications with indirect prompt injection." Proceedings of the 16th ACM Workshop on Artificial Intelligence and Security. 2023.

Prompt injection dataset https://huggingface.co/datasets/deepset/prompt-injections Retrieved: Mar, 2024.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность MoNeTec 2024

ISSN: 2307-8162