Investigating the Interpretability of Pedestrian Detection Models via Testing with Concept Activation Vectors (TCAV): A Cross-Architectural Empirical Analysis

Chenfeng Liu

Abstract


This study addresses the “black-box” issue in pedestrian detection models deployed in high-risk scenarios, such as autonomous driving and intelligent security systems, by employing the Testing with Concept Activation Vectors (TCAV) framework. We conduct a cross-layer empirical analysis of feature representations across three representative architectures: Faster R-CNN, YOLOv11, and RT-DETR. An automatic cropping strategy based on human anatomical proportions is introduced to construct semantic concept sets (head, torso, and legs), enabling quantitative evaluation of semantic evolution at different network depths. Results show that Faster R-CNN follows a progressive semantic modeling pattern from local textures to global structures while maintaining stable semantic disentanglement in deeper layers; YOLOv11 exhibits semantic latency, with structured human-body representations emerging mainly within the feature fusion module rather than the backbone network; and RT-DETR, despite strong semantic separability, demonstrates gradient sparsity induced by output saturation, limiting the effectiveness of linear interpretability methods. These findings delineate the applicability boundaries of TCAV-based interpretation across heterogeneous architectural paradigms and provide a quantitative basis for designing highly transparent pedestrian detection systems.

Full Text:

PDF (Russian)

References


Rudin C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead[J]. Nature machine intelligence, 2019, 1(5): 206-215.

Kim B, Wattenberg M, Gilmer J, et al. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav)[C]//International conference on machine learning. PMLR, 2018: 2668-2677.

Seyedmomeni F S, Keyvanrad M A. Explaining What Machines See: XAI Strategies in Deep Object Detection Models[J]. arXiv preprint arXiv:2509.01991, 2025.

Zhang S, Benenson R, Schiele B. Citypersons: A diverse dataset for pedestrian detection[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 3213-3221.

Chen V, Yang M, Cui W, et al. Best practices for interpretable machine learning in computational biology[J]. Biorxiv, 2022: 2022.10. 28.513978.

Adebayo J, Gilmer J, Muelly M, et al. Sanity checks for saliency maps[J]. Advances in neural information processing systems, 2018, 31.

Naufaldihanif R, Kurniawan D, Tania K D. Performance Analysis of YOLO, Faster R-CNN, and DETR for Automated Personal Protective Equipment Detection[J]. Journal of Applied Informatics and Computing, 2025, 9(6): 3810-3820.

Padilla R, Netto S L, Da Silva E A B. A survey on performance metrics for object-detection algorithms[C]//2020 international conference on systems, signals and image processing (IWSSIP). IEEE, 2020: 237-242.

Girshick R. Fast r-cnn[C]//Proceedings of the IEEE international conference on computer vision. 2015: 1440-1448.

Khanam R, Hussain M. Yolov11: An overview of the key architectural enhancements[J]. arXiv preprint arXiv:2410.17725, 2024.

Zhao Y, Lv W, Xu S, et al. Detrs beat yolos on real-time object detection[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024: 16965-16974.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность Monetec 2026 СНЭ

ISSN: 2307-8162