Extraction of trigger and mask from poisoned data using modified Activation Clustering and Neural Cleanse methods

Ivan Lozinskii, Vasily Kostyumov, Ekaterina Stroeva

Abstract


In some works, it has been repeatedly noticed that the popular Neural Cleanse method does a poor job of restoring triggers and masks that occupy a significant part of the image, since the method looks for the least poisoning change. To solve this problem, we proposes a method for extracting a trigger from an averaged image of a poisoned cluster of images. The trigger can be extracted by filtering the pixel color intensity of the averaged image. To select clusters of images, a modification of the Activation Clustering method is used. The experiments were conducted on data from the Trojan Detection Challenge, NeurIPS 2022. In this data, a single trigger translates any image to the target class. In such a poisoning model, the original Activation Clustering shows poor results. So we proposed a modified version of the Activation Clustering in the article. To restore the mask to the selected trigger, a modification of the Neural Cleanse method was developed. The  developed method shows a significantly higher quality of trigger isolation in comparison with the original Neural Cleanse.


Full Text:

PDF (Russian)

References


Unsolved Problems in ML Safety / Dan Hendrycks, Nicholas Carlini, John Schulman, Jacob Steinhardt // arXiv:2109.13916 [cs]. — 2021. — . — arXiv: 2109.13916. URL: http://arxiv.org/abs/2109.13916 (online; accessed: 2021-11-05).

Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering / Bryant Chen, Wilka Carvalho, Nathalie Baracaldo et al. // arXiv:1811.03728 [cs, stat]. — 2018. — . — arXiv: 1811.03728. URL: http://arxiv.org/abs/1811.03728 (online; accessed: 2021-11-05).

Secure Distributed Training at Scale / Eduard Gorbunov, Alexander Borzunov, Michael Diskin, Max Ryabinin // arXiv:2106.11257 [cs, math]. — 2021. — . — arXiv: 2106.11257. URL: http://arxiv.org/abs/ 2106.11257 (online; accessed: 2021-11-05).

Data Poisoning Attacks on Federated Machine Learning / Gan Sun, Yang Cong, Jiahua Dong et al. // arXiv:2004.10020 [cs]. — 2020. — . — arXiv: 2004.10020. URL: http://arxiv.org/abs/2004.10020 (online; accessed: 2021-12-19).

Salem Ahmed, Backes Michael, Zhang Yang. Don’t Trigger Me! A Triggerless Backdoor Attack Against Deep Neural Networks // arXiv:2010.03282 [cs]. — 2020. — . — arXiv: 2010.03282. URL: http://arxiv.org/abs/2010.03282 (online; accessed: 2021-11-21).

BACKDOORL: Backdoor Attack against Competitive Reinforcement Learning / Lun Wang, Zaynah Javed, Xian Wu et al. // arXiv:2105.00579 [cs]. — 2021. — . — arXiv: 2105.00579. URL: http://arxiv.org/abs/2105.00579 (online; accessed: 2021-11-21). 7

Clean-Label Backdoor Attacks on Video Recognition Models / Shihao Zhao, Xingjun Ma, Xiang Zheng et al. // arXiv:2003.03030 [cs]. — 2020. — . — arXiv: 2003.03030. URL: http://arxiv.org/abs/ 2003.03030 (online; accessed: 2021-11-21).

BadNL: Backdoor attacks against NLP models with semanticpreserving improvements / Xiaoyi Chen, Ahmed Salem, Dingfan Chen et al. // Annual Computer Security Applications Conference. — ACM, 2021. — dec. — URL:

Gu Tianyu, Dolan-Gavitt Brendan, Garg Siddharth. Badnets: Identifying vulnerabilities in the machine learning model supply chain. — 2019. — 1708.06733.

Deep Feature Space Trojan Attack of Neural Networks by Controlled Detoxification / Siyuan Cheng, Yingqi Liu, Shiqing Ma, Xiangyu Zhang // arXiv:2012.11212 [cs]. — 2021. — . — arXiv: 2012.11212. URL: http://arxiv.org/abs/2012.11212 (online; accessed: 2021-12-12).

Neural Cleanse: Identifying and Mitigating Backdoor Attacks in Neural Networks / Bolun Wang, Yuanshun Yao, Shawn Shan et al. // 2019 IEEE Symposium on Security and Privacy (SP). — San Francisco, CA, USA : IEEE, 2019. — . — P. 707–723. — URL: https://ieeexplore. ieee.org/document/8835365/ (online; accessed: 2021-12-11).

Razmi Fereshteh, Xiong Li. Classification Auto-Encoder based Detector against Diverse Data Poisoning Attacks // arXiv:2108.04206 [cs]. — 2021. — . — arXiv: 2108.04206. URL: http://arxiv.org/abs/ 2108.04206 (online; accessed: 2021-11-13).

Jia Jinyuan, Cao Xiaoyu, Gong Neil Zhenqiang. Intrinsic Certified Robustness of Bagging against Data Poisoning Attacks // arXiv:2008.04495 [cs]. — 2020. — . — arXiv: 2008.04495. URL: http://arxiv.org/abs/2008.04495 (online; accessed: 2021-12-20).

GangSweep: Sweep out Neural Backdoors by GAN / Liuwan Zhu, Rui Ning, Cong Wang et al. // Proceedings of the 28th ACM International Conference on Multimedia. — MM ’20. — New York, NY, USA : Association for Computing Machinery, 2020. — . — P. 3173– 3181. — URL: https://doi.org/10.1145/3394171.3413546 (online; accessed: 2021-11-28).

Trojan Detection Challenge - Trojan Detection Challenge. — URL: https://trojandetection.ai/ (online; accessed: 2023-05-01).

Zagoruyko Sergey, Komodakis Nikos. Wide residual networks. — 2017. — 1605.07146.

Beyer Lucas, Zhai Xiaohua, Kolesnikov Alexander. Better plain vit baselines for imagenet-1k. — 2022. — 2205.01580.

CatBoost. — URL: https://catboost.ai/en/docs/ (online; accessed: 2023-05-01).

Wu Baoyuan, Chen Hongrui, Zhang Mingda et al. Backdoorbench: A comprehensive benchmark of backdoor learning. — 2022. — 2206.12654.

Subpopulation Data Poisoning Attacks / Matthew Jagielski, Giorgio Severi, Niklas Pousette Harger, Alina Oprea // Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security. — CCS ’21. — New York, NY, USA : Association for Computing Machinery, 2021. — . — P. 3104–3122. — URL: https: //doi.org/10.1145/3460120.3485368 (online; accessed: 2021-12-14).


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность MoNeTec 2024

ISSN: 2307-8162