The Problem of Search Recall in B2B DIY Product Catalogs: Limitations of Semantic Embeddings and an Entity-Oriented Approach

Fedor Krasnov

Abstract


This paper examines the challenge of ensuring high search recall in B2B catalogs for DIY products (tools and materials for construction and repair). In practice, B2B search imposes significantly stricter recall requirements than B2C recommendation scenarios, as a substantial proportion of queries correspond to known-item retrieval involving exact SKUs, model numbers, and precise technical specifications. Even minor deviations in identifiers or numeric attributes may render results unusable in professional procurement contexts.

The evolution of the search architecture is analyzed, moving from pure dense retrieval based on transformer embeddings (ModernBERT trained with triplet loss) to a hybrid system grounded in entity detection (brand, model, technical specifications) and structured entity matching. It is demonstrated that embedding-based retrieval does not guarantee recall due to smoothing of numerical tokens, semantic compression of identifiers, and the inherent properties of approximate nearest neighbor (ANN) search. Geometric characteristics of embedding spaces introduce invariance to discrete differences that are critical in B2B scenarios.

The proposed anchor-based retrieval architecture yields a substantial improvement in Recall@10 (from 0.65 to 0.97 on real-world data) while preserving acceptable latency and high explainability. Experiments were conducted on an industrial catalog exceeding 10 million items and a dataset of 5,000 real B2B queries. The results demonstrate practical viability for large-scale B2B e-commerce search systems.


Full Text:

PDF

References


Краснов Ф. В. Embedding-based retrieval: measures of threshold recall and precision to evaluate product search //Бизнес-информатика. – 2024. – Т. 18. – №. 2. – С. 22-34.

Krasnov F., Kurushin F., Mogilevich E. Custom shared encoder for enhanced recall in e-commerce product search task //Second Inter-national Conference on Computing, Machine Learning, and Data Science (CMLDS 2025). – SPIE, 2025. – Т. 13730. – С. 84-91.

Краснов Ф. В. Повышение полноты и точно-сти поиска товаров на торговых интернет-площадках //ПРИКЛАДНАЯ ИНФОРМАТИ-КА Учредители: Московский универси-тет”Синергия”. – 2024. –Т. 19. – №. 2. – С. 118-136.

Gan Y. et al. Binary embedding-based retrieval at Tencent //Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. – 2023. – С. 4056-4067.

Li S. et al. Embedding-based product retrieval in taobao search //Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. – 2021. – С. 3181-3189.

Weller O. et al. On the theoretical limitations of embedding-based retrieval //arXiv preprint arXiv:2508.21038. – 2025.

Lin J. et al. Enhancing relevance of embedding-based retrieval at walmart //Proceedings of the 33rd ACM International Conference on Infor-mation and Knowledge Management. – 2024. – С. 4694-4701.

Ren Z. et al. Information Discovery in E-commerce //Foundations and Trends in Ac-counting. – 2024.– Т. 18. – №. 4-5. – С. 417-690.12

Schütze H., Manning C. D., Raghavan P. Intro-duction to information retrieval. – Cambridge : Cambridge University Press, 2008. – Т. 39. – С. 234-265.

Azzopardi L., De Rijke M., Balog K. Building simulated queries for known-item topics: an analysis using six european languages //Proceedings of the 30th annual international ACM SIGIR conference on Research and de-velopment in information retrieval. – 2007. – С. 455-

Park L. A. F. Confidence intervals for infor-mation retrieval evaluation //ADCS 2010. – 2010. – С. 97.

Hull D. Using statistical testing in the evaluation of retrieval experiments //Proceedings of the 16th annual international ACM SIGIR confer-ence on Research and development in infor-mation retrieval. –1993. – С. 329-338.

Robertson S. E. The probability ranking principle in IR //Journal of documentation. – 1977. – Т. 33. –№. 4. – С. 294-304.

Schroff F., Kalenichenko D., Philbin J. Facenet: A unified embedding for face recognition and clustering //Proceedings of the IEEE conference on computer vision and pattern recognition. – 2015. – С. 815-823.

Karpukhin V. et al. Dense passage retrieval for open-domain question answering //Proceedings of the 2020 conference on empirical methods in natural language processing (EMNLP). – 2020. – С. 6769-6781.

Malkov Y. A., Yashunin D. A. Efficient and ro-bust approximate nearest neighbor search using hierarchical navigable small world graphs //IEEE transactions on pattern analysis and machine intelligence. – 2018. – Т. 42. – №. 4. – С. 824-836.

Aumüller M., Bernhardsson E., Faithfull A. ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms //Information Systems. – 2020. – Т. 87. – С. 101374.

Devlin J. et al. BERT: Pre-training of Deep Bidi-rectional Transformers for Language Under-standing. NAACL, 2019.

Warner B. et al. Smarter, better, faster, longer: A modern bidirectional encoder for fast, memory efficient, and long context finetuning and infer-ence //Proceedings of the 63rd Annual Meeting of the Association for Computational Linguis-tics (Volume 1: Long Papers). – 2025. – С. 2526-2547.

Johnson J. et al. Billion-scale similarity search with GPUs. arXiv, 2019.

Варданян А. USearch by Unum Cloud : про-граммное обеспечение. Версия 2.24.0. 2023. URL:https://github.com/unum-cloud/usearch (дата обращения: 21.02.2026). DOI: 10.5281/zenodo.7949416.


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность Monetec 2026 СНЭ

ISSN: 2307-8162