Sleeper Channels and Provenance Gates: Persistent Prompt Injection in Always-on Autonomous AI Agents
Abstract
Full Text:
PDFReferences
“OpenClaw: Personal AI assistant runtime,” https://github.com/openclaw/openclaw, commit
f53e789caf565e60ba29cb9751829b1b6, 2026-
-27, 2026.
Nous Research, “Hermes Agent,” https://github.com/nousresearch/hermes-agent, commit
d75dea5a86aec599b1e081f8bbe9170bd3f964, 2026-04-27; release v0.11.0, 2026-04-23, 2026.
K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz,
“Not what you’ve signed up for: Compromising real-world LLMintegrated applications with indirect prompt injection,” in Proc. 16th ACM Workshop on Artificial Intelligence and Security (AISec), 2023, arXiv:2302.12173.
E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer,
and F. Tramer, “AgentDojo: A dynamic environment to evaluate prompt `
injection attacks and defenses for LLM agents,” in Advances in Neural Information Processing Systems, Datasets and Benchmarks Track, 2024, arXiv:2406.13352.
S. S. Srivastava, “MemoryGraft: Persistent compromise of LLM agents via poisoned experience retrieval,” arXiv:2512.16962, Dec. 2025.
E. Hubinger et al., “Sleeper agents: Training deceptive LLMs that persist through safety training,” arXiv:2401.05566, Jan. 2024.
N. Hardy, “The confused deputy (or why capabilities might have been
invented),” ACM SIGOPS Operating Systems Review, vol. 22, no. 4, pp. 36–38, 1988.
M. S. Miller, “Robust composition: Towards a unified approach to access control and concurrency control,” Ph.D. dissertation, Johns Hopkins University, 2006.
OpenClaw maintainers, “OpenClaw threat model v1.0 (MITRE ATLAS),” docs/security/THREAT-MODEL-ATLAS.md, OpenClaw repository at commit 3120401f53e789caf565e60ba29cb9751829b1b6, last updated 2026-02-04, 2026.
Anonymous community contributor, “Feature: Runtime prompt injection defenses,” Upstream issue (date, handle, and number anonymized for double-blind review), declined upstream, 2026.
F. Perez and I. Ribeiro, “Ignore previous prompt: Attack techniques for language models,” arXiv:2211.09527, 2022.
S. Toyer, O. Watkins, E. A. Mendes, J. Svegliato, L. Bailey, T. Wang,
I. Ong, K. Elmaaroufi, P. Abbeel, T. Darrell, A. Ritter, and S. Russell, “Tensor Trust: Interpretable prompt injection attacks from an online game,” arXiv:2311.01011, 2023.
H. Zhang, J. Huang, K. Mei, Y. Yao, Z. Wang, C. Zhan, H. Wang, and Y. Zhang, “Agent security bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents,” arXiv:2410.02644, 2024.
Q. Zhan, Z. Liang, Z. Ying, and D. Kang, “InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents,” Findings of ACL, 2024.
Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “AgentPoison: Redteaming LLM agents via poisoning memory or knowledge bases,” Proc.
NeurIPS, 2024.
W. Zou, R. Geng, B. Wang, and J. Jia, “PoisonedRAG: Knowledge corruption attacks to retrieval-augmented generation of large language models,” Proc. USENIX Security Symposium, 2024.
M. Nasr et al., “The attacker moves second: Stronger adaptive attacks bypass defenses against LLM jailbreaks and prompt injections,”
arXiv:2510.09023, Oct. 2025.
N. Carlini, M. Nasr, C. A. Choquette-Choo, M. Jagielski, I. Gao, A. Awadalla, P. W. Koh, D. Ippolito, K. Lee, F. Tramer, and L. Schmidt, “Are aligned neural networks adversarially aligned?” Proc. NeurIPS, 2024.
J. H. Saltzer and M. D. Schroeder, “The protection of information in computer systems,” Proceedings of the IEEE, vol. 63, no. 9, pp. 1278– 1308, 1975.
M. S. Miller, K.-P. Yee, and J. Shapiro, “Capability myths demolished,” in Tech. Rep. SRL2003-02. Johns Hopkins Univ. Systems Research Laboratory, 2003.
H. M. Levy, Capability-based computer systems. Digital Press, 1984.
Meta AI Security, “Agents rule of two: A practical approach to AI agent security,” Tech. blog, Oct. 2025.
D. E. Denning, “A lattice model of secure information flow,” Communications of the ACM, vol. 19, no. 5, pp. 236–243, 1976.
L. Wall, T. Christiansen, and J. Orwant, Programming Perl, 3rd ed. O’Reilly, 2000.
G. E. Suh, J. W. Lee, D. Zhang, and S. Devadas, “Secure program execution via dynamic information flow tracking,” in Proc. 11th Int. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2004, pp. 85–96.
W. Enck, P. Gilbert, S. Han, V. Tendulkar, B.-G. Chun, L. P. Cox, J. Jung, P. McDaniel, and A. N. Sheth, “TaintDroid: An informationflow tracking system for realtime privacy monitoring on smartphones,” in Proc. 9th USENIX Symp. Operating Systems Design and Implementation (OSDI), 2010, pp. 393–407.
E. J. Schwartz, T. Avgerinos, and D. Brumley, “All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask),” in Proc. IEEE Symp. Security and Privacy (S&P), 2010, pp. 317–331.
M. Costa et al., “Securing AI agents with information-flow control,” arXiv:2505.23643, 2025.
E. Debenedetti, I. Shumailov, T. Fan, J. Hayes, N. Carlini, D. Fabian, C. Kern, C. Shi, F. Tramer, and A. Terzis, “Defeating prompt injections ` by design,” arXiv:2503.18813, 2025.
M. Surbatovich, J. Aljuraidan, L. Bauer, A. Das, and L. Jia, “Some recipes can do more than spoil your appetite: Analyzing the security and privacy risks of IFTTT recipes,” in Proc. 26th Int. Conf. World Wide Web (WWW), 2017, pp. 1501–1510.
Q. Wang, W. U. Hassan, A. Bates, and C. A. Gunter, “Fear and logging in the Internet of Things,” in Proc. NDSS, 2018.
OWASP Foundation, “Agentic Security Initiative,” https://genai.owasp.org/initiatives/agentic-security-initiative/, accessed Apr. 2026.
Z. Deng, Y. Guo, C. Han, W. Ma, J. Xiong, S. Wen, and Y. Xiang, “AI agents under threat: A survey of key security challenges and future pathways,” arXiv:2406.02630, 2025.
F. He, T. Zhu, D. Ye, B. Liu, W. Zhou, and P. S. Yu, “The emerged security and privacy of LLM agent: A survey with case studies,” arXiv:2407.19354, 2024.
Z. Zhong, Z. Huang, A. Wettig, and D. Chen, “Poisoning retrieval corpora by injecting adversarial passages,” in Proc. EMNLP, 2023.
Trusted Computing Group, “TPM 2.0 library specification, part 1: Architecture,” Specification Version 1.59, 2019.
Y. Liu, Y. Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizing and benchmarking prompt injection attacks and defenses,” in Proc. 33rd USENIX Security Symposium, 2024, arXiv:2310.12815.
C. Collberg and T. A. Proebsting, “Repeatability in computer systems
research,” in Communications of the ACM, vol. 59, no. 3, 2016, pp. 62–69.
Refbacks
- There are currently no refbacks.
Abava Кибербезопасность Monetec 2026 СНЭ
ISSN: 2307-8162