Prod2Query: Solving the Problem of Cold Start for E-Commerce Using Generative Language Modeling

Fedor Krasnov


Large online marketplaces introduce thousands of new items daily. In order to purchase a new item, users must locate it through a search function, and search engines now predominantly utilize behavioral indicators when indexing products, such as purchasing, clicking, and viewing items, which are unavailable for new products. This constitutes the issue of the "cold start" of sales. With the advent of generative language models, it is now feasible to train a language model on user behavior to generate search queries for novel products. Consequently, a collection of synthetic behavioral data for novel items is generated, which may be utilized to train a search engine. The primary aim of this study is to assess the extent to which the autonomous indicators of a search engine trained on such synthetic data for novel products improve. Prod2Query is founded on the architecture of an Encoder-Decoder model, based on BERT transformers. As a result of testing new products using the Prod2Query system, an indicator for the new product extraction model was obtained, with an mAP@12 score of 77.2%. This score is on par with state-of-the-art (SOTA) models, indicating that the "cold start" problem can be effectively addressed based on signals from sales representatives when introducing new products. By generating synthetic search queries and training search models using these signals, it is possible to achieve high levels of accuracy in product retrieval.

Full Text:

PDF (Russian)


Bernardi L. et al. The continuous cold start problem in e-commerce recommender systems //arXiv preprint arXiv:1508.01177. – 2015.

Wang H. et al. A dnn-based cross-domain recommender system for alleviating cold-start problem in e-commerce //IEEE Open Journal of the Industrial Electronics Society. – 2020. – Т. 1. – С. 194-206.

Patro S. G. K. et al. Cold start aware hybrid recommender system approach for E-commerce users //Soft Computing. – 2023. – Т. 27. – №. 4. – С. 2071-2091.

Han C. et al. Addressing Cold Start in Product Search via Empirical Bayes //Proceedings of the 31st ACM International Conference on Information & Knowledge Management. – 2022. – С. 3141-3151.

Gupta P. et al. Treating cold start in product search by priors //Companion Proceedings of the Web Conference 2020. – 2020. – С. 77-78.

Gong Y. et al. An Unified Search and Recommendation Foundation Model for Cold-Start Scenario //Proceedings of the 32nd ACM International Conference on Information and Knowledge Management. – 2023. – С. 4595-4601.

Missault P. et al. Addressing cold start with dataset transfer in e-commerce learning to rank. – 2021.

Li S. Embedding-based product retrieval in Taobao search // Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining. — 2021. — С. 3181-3189.

Dai Z., Callan J. Deeper text understanding for IR with contextual neural language modeling //Proceedings of the 42nd international ACM SIGIR conference on research and development in information retrieval. – 2019. – С. 985-988.

Nogueira R., Cho K. Passage Re-ranking with BERT //arXiv preprint arXiv:1901.04085. – 2019.

Dai Z., Callan J. Context-aware document term weighting for ad-hoc search //Proceedings of The Web Conference 2020. – 2020. – С. 1897-1907.

Nogueira R. et al. Document expansion by query prediction //arXiv preprint arXiv:1904.08375. – 2019.

Scells H., Zhuang S., Zuccon G. Reduce, reuse, recycle: Green information retrieval research //Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval. – 2022. – С. 2825-2837.

Nogueira R., Lin J., Epistemic A. I. From doc2query to docTTTTTquery //Online preprint. – 2019. – Т. 6. – С. 2.

MacAvaney S. et al. Expansion via prediction of importance with contextualization //Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval. – 2020. – С. 1573-1576.

MacAvaney S., Tonellotto N., Macdonald C. Adaptive re-ranking with a corpus graph //Proceedings of the 31st ACM International Conference on Information & Knowledge Management. – 2022. – С. 1491-1500.

Mallia A. et al. Learning passage impacts for inverted indexes //Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. – 2021. – С. 1723-1727.

Zhuang S., Zuccon G. TILDE: Term independent likelihood moDEl for passage re-ranking //Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval. – 2021. – С. 1483-1492.

Maynez J. et al. On faithfulness and factuality in abstractive summarization //arXiv preprint arXiv:2005.00661. – 2020.

Papineni K. et al. Bleu: a method for automatic evaluation of machine translation //Proceedings of the 40th annual meeting of the Association for Computational Linguistics. – 2002. – С. 311-318.

Doddington G. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics //Proceedings of the second international conference on Human Language Technology Research. – 2002. – С. 138-145.

Lin C. Y. Rouge: A package for automatic evaluation of summaries //Text summarization branches out. – 2004. – С. 74-81.

Snover M. et al. A study of translation edit rate with targeted human annotation //Proceedings of the 7th Conference of the Association for Machine Translation in the Americas: Technical Papers. – 2006. – С. 223-231.

Banerjee S., Lavie A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments //Proceedings of the acl workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization. – 2005. – С. 65-72.

Meister C., Vieira T., Cotterell R. If beam search is the answer, what was the question? //arXiv preprint arXiv:2010.02650. – 2020.

Zhao M., White M., Javed F. Query Rewrite for Low Performing Queries in E-commerce Based On Customer Behavior. – 2020.

Cho E. et al. Personalized search-based query rewrite system for conversational ai //Proceedings of the 3rd Workshop on Natural Language Processing for Conversational AI. – 2021. – С. 179-188.

Cui J. et al. Knowledge distillation across ensembles of multilingual models for low-resource languages //2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). – IEEE, 2017. – С. 4825-4829.

Huang P. S. et al. Learning deep structured semantic models for web search using clickthrough data //Proceedings of the 22nd ACM international conference on Information & Knowledge Management. – 2013. – С. 2333-2338.

Nigam P. et al. Semantic product search //Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. – 2019. – С. 2876-2885.

Huang J. T. et al. Embedding-based retrieval in facebook search //Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. – 2020. – С. 2553-2561.

Klein G. et al. Opennmt: Open-source toolkit for neural machine translation //arXiv preprint arXiv:1701.02810. – 2017.

Post M. A call for clarity in reporting BLEU scores //arXiv preprint arXiv:1804.08771. – 2018.


  • There are currently no refbacks.

Abava  Кибербезопасность MoNeTec 2024

ISSN: 2307-8162