Survey of multi­armed bandit algorithms applied to recommendation systems

Elena Gangan, Milos Kudus, Eugene Ilyushin

Abstract


The main goal of this paper is to introduce the reader to the multi­armed bandit algorithms of different types and to observe how the industry leveraged them in advancing recommendation systems. We present the current state of the art in RecSys and then explain what multi­armed bandits can bring to the table. First, we present the formalization of the multi­armed bandit problem and show the most common bandit algorithms such as upper confidence bound, Thompson Sampling, epsilon greedy, EXP3. Expanding on that knowledge, we review some important contextual bandits and present their benefits and usage in different applications. In this setting, context means side information about the users or the items of the problem. We also survey various techniques in multi­armed bandits that make bandits fit to the task of recommendations better; namely we consider bandits with multiple plays, multiple objective optimizations, clustering and collaborative filtering approaches. We also assess bandit backed recommendation systems implemented in the industry ­ at Spotify, Netflix, Yahoo and others. At the same time we discuss methods of bandit evaluation and present an empirical evaluation of some notorious algorithms. We conduct short experiments on 2 datasets to show how different policies compare to each other and observe the importance of parameter tuning. This paper is a survey of the multi armed bandit algorithms and their applications to recommendation systems.

Full Text:

PDF

References


J. Kawale and E. Chow, “A Multi­Armed Bandit Framework For Recommendations at Netflix,” 2018, library Catalog: SlideShare.

J. McInerney, B. Lacker, S. Hansen, K. Higley, H. Bouchard, A. Gruson, and R. Mehrotra, “Explore, Exploit, and Explain: Personalizing Explainable recommendations with Bandits,” in RecSys ’18: Proceedings of the 12th ACM Conference on Recommender Systems, 2018, pp. 31–39.

R. Mehrotra, N. Xue, and M. Lalmas, “Bandit based Optimization of Multiple Objectives on a Music Streaming Platform,” in Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. Virtual Event CA USA: ACM, Aug. 2020, pp. 3224–3233. [Online]. Available: https://dl.acm.org/doi/10.1145/3394486.3403374

L. Li, W. Chu, J. Langford, and R. E. Schapire, “A ContextualBandit Approach to Personalized News Article Recommendation,” Proceedings of the 19th international conference on World wide web ­ WWW ’10, p. 661, 2010, arXiv: 1003.0146. [Online]. Available: http://arxiv.org/abs/1003.0146

Y. Mao, M. Chen, A. Wagle, M. Natkovich, J. Pan, and D. Matheson, “A Batched Multi­Armed Bandit Approach to News Headline Testing,” in IEEE International Conference on Big Data (Big Data). Seattle, WA, USA, USA: IEEE, 2018.

Y. Koren, R. Bell, and C. Volinsky, “Matrix factorization techniques for recommender systems,” vol. 42, no. 8, p. 30–37, Aug. 2009. [Online]. Available: https://doi.org/10.1109/MC.2009.263

X. He, L. Liao, H. Zhang, L. Nie, X. Hu, and T.­S. Chua, “Neural collaborative filtering,” in Proceedings of the 26th International Conference on World Wide Web, ser. WWW ’17. Republic and Canton of Geneva, CHE: International World Wide Web Conferences Steering Committee, 2017, p. 173–182. [Online]. Available: https://doi.org/10.1145/3038912.3052569

W. Sun, X. Zhang, W. Liang, and Z. He, “High dimensional explicit feature biased matrix factorization recommendation,” in Trends and Applications in Knowledge Discovery and Data Mining, X.­L. Li, T. Cao, E.­P. Lim, Z.­H. Zhou, T.­B. Ho, and D. Cheung, Eds. Springer International Publishing, 2015, vol. 9441, pp. 66–77, series Title: Lecture Notes in Computer Science.

O. Alieva, E. Gangan, E. Ilyushin, and A. Kachalin, “Automatic evaluation of recommendation models,” International scientific journal «Modern Information Technologies and IT­Education», vol. 16, no. 2, 2020. [Online]. Available: http://sitito.cs.msu.ru/index. php/SITITO/article/view/656

S. Li, A. Karatzoglou, and C. Gentile, “Collaborative Filtering Bandits,” in SIGIR ’16: Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval, Pisa, Italy, 2016, pp. 539–548.

S. Pandey, D. Chakrabarti, and D. Agarwal, “Multi­armed bandit problems with dependent arms,” in Proceedings of the 24th international conference on Machine learning ­ ICML ’07. Corvalis, Oregon: ACM Press, 2007, pp. 721–728. [Online]. Available: http://portal.acm.org/citation.cfm?doid=1273496.1273587

S. Caron and S. Bhagat, “Mixing bandits: a recipe for improved coldstart recommendations in a social network,” 08 2013.

H. T. Nguyen and A. Kofod­Petersen, “Using Multi­armed Bandit to Solve Cold­Start Problems in Recommender Systems at Telco,” in Mining Intelligence and Knowledge Exploration, R. Prasath, P. O’Reilly, and T. Kathirvalavakumar, Eds. Cham: Springer International Publishing, 2014, vol. 8891, pp. 21–30, series Title: Lecture Notes in Computer Science. [Online]. Available: http://link.springer.com/10.1007/978­3­319­13817­63

C. Z. Felício, K. V. Paixão, C. A. Barcelos, and P. Preux, “A MultiArmed Bandit Model Selection for Cold­Start User Recommendation,”

in Proceedings of the 25th Conference on User Modeling, Adaptation and Personalization. Bratislava Slovakia: ACM, Jul. 2017, pp. 32–40. [Online]. Available: https://dl.acm.org/doi/10.1145/3079628.3079681

A. Slivkins, “Introduction to multi­armed bandits,” Foundations and Trends® in Machine Learning, vol. 12, no. 1­2, pp. 1–286, 2019. [Online]. Available: http://dx.doi.org/10.1561/2200000068

S. Bubeck and A. Slivkins, “The best of both worlds: Stochastic and adversarial bandits,” Journal of Machine Learning Research, vol. 23, 02 2012.

T. Lattimore and C. Szepesvári, Bandit Algorithms. Cambridge University Press, 2020.

J. M. White, Bandit Algorithms for Website Optimization. O’Reilly Media, Inc. [Online]. Available: https://www.oreilly.com/library/view/ bandit­algorithms­for/9781449341565/

P. Auer and N. Cesa­Bianchi, “Finite­time Analysis of the Multiarmed Bandit Problem,” Machine Learning 47, p. 22, 2002.

O. Chapelle and L. Li, “An Empirical Evaluation of Thompson Sampling,” in Processing Advances in Neural Information Systems, vol. 24, 2011, pp. 2249–2257. [Online]. Available: http://papers.nips.cc/paper/4321­an­empirical­evaluation­of­thompson­sampling.pdf

A. Gopalan, S. Mannor, and Y. Mansour, “Thompson Sampling for Complex Online Problems,” in Proceedings of the 31 st International Conference on Machine Learning, Beijing, China, 2014.

S. Agrawal and N. Goyal, “Analysis of thompson sampling for the multi­armed bandit problem,” ser. Proceedings of Machine Learning Research, S. Mannor, N. Srebro, and R. C. Williamson, Eds., vol. 23. Edinburgh, Scotland: JMLR Workshop and Conference Proceedings, 25–27 Jun 2012, pp. 39.1–39.26. [Online]. Available: http://proceedings.mlr.press/v23/agrawal12.html

D. J. Russo, B. Van Roy, A. Kazerouni, I. Osband, and Z. Wen, “A tutorial on thompson sampling,” Found. Trends Mach. Learn., vol. 11, no. 1, p. 1–96, Jul. 2018. [Online]. Available: https://doi.org/10.1561/2200000070

V. Kuleshov and D. Precup, “Algorithms for multi­armed bandit problems,” Journal of Machine Learning Research, vol. 1, 02 2014.

P. Auer, N. Cesa­Bianchi, Y. Freund, and R. E. Schapire, “The Nonstochastic Multiarmed Bandit Problem,” SIAM Journal on Computing, vol. 32, no. 1, pp. 48–77, Jan. 2002. [Online]. Available: http://epubs.siam.org/doi/10.1137/S0097539701398375

J. Langford and T. Zhang, “The epoch­greedy algorithm for multi­armed bandits with side information,” in Advances in Neural Information Processing Systems 20, J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, Eds. Curran Associates, Inc., 2008, pp. 817–824. [Online]. Available: http://papers.nips.cc/paper/3178­the­epoch­greedy­algorithm­for­multi­armed­bandits­with­side­information.pdf

D. Cortes, “Adapting multi­armed bandits policies to contextual bandits scenarios,” arXiv:1811.04383v2, 2019.

L. Zhou, “A survey on contextual multi­armed bandits,” ArXiv, vol. abs/1508.03326, 2015.

W. Chu, L. Li, L. Reyzin, and R. Schapire, “Contextual bandits with linear payoff functions,” ser. Proceedings of Machine Learning Research, G. Gordon, D. Dunson, and M. Dudík, Eds., vol. 15. Fort Lauderdale, FL, USA: JMLR Workshop and Conference Proceedings, 11–13 Apr 2011, pp. 208–214. [Online]. Available: http://proceedings.mlr.press/v15/chu11a.html

L. Zhou and E. Brunskill, “Latent Contextual Bandits and their Application to Personalized Recommendations for New User,” arXiv:1604.06743 [cs.LG], 2016.

S. LI, B. Wang, S. Zhang, and W. Chen, “Contextual Combinatorial cascading Bandits,” in Proceedings of The 33rd International Conference on Machine Learning, vol. 48, 2016.

X. Xu, F. Dong, Y. Li, S. He, and X. Li, “Contextual­Bandit Based Personalized Recommendation with Time­Varying User Interests,” arXiv:2003.00359v1 [cs.LG], 2020.

B. Hao, Y. Abbasi Yadkori, Z. Wen, and G. Cheng, “Bootstrapping upper confidence bound,” in Advances in Neural Information Processing Systems 32. Curran Associates, Inc., 2019. [Online]. Available: http:// papers.nips.cc/paper/9382­bootstrapping­upper­confidence­bound.pdf

D. Zhou, L. Li, and Q. Gu, “Neural Contextual Bandits with UCBbased Exploration,” in Proceedings of the 37 th International Conference on Machine Learning, Vienna, Austria, 2020.

S. Agrawal and N. Goyal, “Thompson Sampling for Contextual Bandits with Linear Payoffs,” in Proceedings of the 30 th International Conference on Machine Learning, vol. 28, Atlanta, Georgia, 2013.

I. Osband and B. Roy, "Bootstrapped thompson sampling and deep exploration," 07 2015.

L. Tang, Y. Jiang, L. Li, C. Zeng, and T. Li, “Personalized Recommendation via Parameter­Free Contextual Bandit,” in SIGIR ’15: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2015, pp. 323–332.

L. Li, W. Chu, J. Langford, and X. Wang, “Unbiased Offline Evaluation of Contextual­bandit­based news Article Recommendation Algorithms,” WSDM’11, 2011. [Online]. Available: https://arxiv.org/pdf/1003.5956.pdf

C. Riquelme, G. Tucker, and J. Snoek, “Deep bayesian bandits showdown,” in International Conference on Learning Representations, Vancouver Canada, 2018.

M. Dudík, D. J. Hsu, S. Kale, N. Karampatziakis, J. Langford, L. Reyzin, and T. Zhang, “Efficient optimal learning for contextual bandits,” ArXiv, vol. abs/1106.2369, 2011.

L. Reyzin, “New algorithms for contextual bandits,” 2012. [Online]. Available: http://www.levreyzin.com/presentations/BGU2010.pdf

A. Agarwal, D. Hsu, S. Kale, J. Langford, L. Li, and R. E. Schapire, “Taming the Monster: A Fast and Simple Algorithm for Contextual Bandits,” in Proceedings of the 31 st International Conference on Machine Learning, Beijing, China, 2014.

T. Uchiya, A. Nakamura, and M. Kudo, “Algorithms for adversarial bandit problems with multiple plays,” in Algorithmic Learning Theory, M. Hutter, F. Stephan, V. Vovk, and T. Zeugmann, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, pp. 375–389.

J. Louedec, M. Chevalier, J. Mothe, A. Garivier, and S. Gerchinovitz, “A Multiple­play Bandit Algorithm Applied to Recommender Systems,” Association for the Advancement of Artificial Intelligence, 2015.

Y.­H. Chang and H.­T. Lin, “Pairwise Regression with Upper Confidence Bound for Contextual Bandit with Multiple Actions,” in Proceedings of the Conference on Technologies and Applications for Artificial Intelligence, 2013, pp. 19–24.

J. Komiyama, J. Honda, and H. Nakagawa, “Optimal regret analysis of thompson sampling in stochastic multi­armed bandit problem with multiple plays,” in Proceedings of the 32nd International Conference on International Conference on Machine Learning ­ Volume 37, ser. ICML’15. JMLR.org, 2015, p. 1152–1161.

R. Busa­Fekete, B. Szoenyi, P. Weng, and S. Mannor, “Multiobjective Bandits: Optimizing the Generalized Gini Index,” arXiv:1706.04933v1, 2017.

J. A. Weymark, “Generalized Gini inequality indices,” Université catholique de Louvain, Center for Operations Research and Econometrics (CORE), CORE Discussion Papers RP 453, Jan. 1981. [Online]. Available: https://ideas.repec.org/p/cor/louvrp/453.html

S. Yahyaa, M. Drugan, and B. Manderick, “Thompson sampling in the adaptive linear scalarized multi objective multi armed bandit,” vol. 2, 01 2015.

C. Tekin and E. Turgay, “Multi­objective contextual bandits with a dominant objective,” in 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP), 2017, pp. 1–6.

E. Turgay, D. Öner, and C. Tekin, “Multi­objective contextual bandit problem with similarity information,” 03 2018.

C. Gentile, S. Li, and G. Zappella, “Online Clustering of Bandits,” in Proceedings of the 31 st International Conference on Machine Learning, vol. 34, Beijing, China, 2017.

T. Nguyen, “Dynamic clustering of contextual multi­armed bandits. in proceedings of the 23rd acm international conference on conference on information and knowledge management (cikm ’14),” 11 2014.

H. Wang, Q. Wu, and H. Wang, “Factorization bandits for interactive recommendation,” in AAAI’17: Proceedings of the Thirty­First AAAI Conference on Artificial Intelligence, 2017, pp. 2695–2702.

J. Kawale, H. H. Bui, B. Kveton, L. Tran­Thanh, and S. Chawla, “Efficient Thompson Sampling for Online MatrixFactorization Recommendation,” in Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, Eds. Curran Associates, Inc., 2015, pp. 1297–1305. [Online]. Available: http://papers.nips.cc/paper/5985­efficient­thompson­sampling­for­online­matrix­factorization­recommendation.pdf

D. Chakrabarti, R. Kumar, E. Upfal, and F. Radlinski, “Mortal Multi­Armed Bandits,” in Advances in Neural Information Processing Systems 21, Proceedings of the Twenty­Second Annual Conference on Neural Information Processing Systems, Vancouver, British Columbia, Canada, 2008.

C. Zeng, Q. Wang, S. Mokhtari, and T. Li, “Online ContextAware Recommendation with Time Varying Multi­Armed Bandit,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco California USA: ACM, Aug. 2016, pp. 2025–2034. [Online]. Available: https://dl.acm.org/doi/10.1145/2939672.2939878

R. Polikar, “Ensemble based systems in decision making,” IEEE Circuits and Systems Magazine, vol. 6, no. 3, pp. 21–45, 2006.

B. Brodén, M. Hammar, D. Paraschakis, and B. J. Nilsson, “Ensemble Recommendations via Thompson Sampling: an Experimental Study within e­Commerce,” in IUI ’18: 23rd International Conference on Intelligent User Interfaces, 2018, pp. 18–29.

D. Bouneffouf, R. Laroche, T. Urvoy, R. Feraud, and A. Robin, “Contextual bandit for active learning: Active thompson sampling,” 11 2014.

C.­C. Hsieh, J. Neufeld, T. King, and J. Cho, “Efficient approximate thompson sampling for search query recommendation,” ser. SAC ’15. New York, NY, USA: Association for Computing Machinery, 2015, p. 740–746. [Online]. Available: https://doi.org/10.1145/2695664.2695748

M. Lopes, B. Clement, D. Roy, and P.­Y. Oudeyer, “Multi­armed bandits for intelligent tutoring systems,” 10 2013.

J. Mary, P. Preux, and O. Nicol, “Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques,” ArXiv, vol. abs/1405.3536, 2014.

A. Beygelzimer and J. Langford, “The offset tree for learning with partial labels,” ser. KDD ’09. New York, NY, USA: Association for Computing Machinery, 2009, p. 129–138. [Online]. Available: https://doi.org/10.1145/1557019.1557040

M. Dudík, D. Erhan, J. Langford, and L. Li, “Doubly robust policy evaluation and optimization,” Statist. Sci., vol. 29, no. 4, pp. 485–511, 11 2014. [Online]. Available: https://doi.org/10.1214/14­STS500

S. A. M. M. Y. N. Saito, Yuta, “A large­scale open dataset for bandit algorithms,” arXiv preprint arXiv:2008.07146, 2020.

D. N. Hill, H. Nassif, Y. Liu, A. Iyer, and S. Vishwanathan, “An Efficient Bandit Algorithm for Realtime Multivariate Optimization,” in Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Halifax NS Canada: ACM, Aug. 2017, pp. 1813–1821. [Online]. Available: https://dl.acm.org/doi/10.1145/3097983.3098184

A. Chandrashekar, F. Amat, J. Basilico, and T. Jebara. Artwork personalization at netflix. [Online]. Available: https://netflixtechblog. com/artwork­personalization­c589f074ad76

M. Hejazinia, K. Eastman, S. Ye, A. Amirabadi, and R. Divvela, “Accelerated learning from recommender systems using multi­armed

bandit,” arXiv:1908.06158 [cs], Aug. 2019, arXiv: 1908.06158. [Online]. Available: http://arxiv.org/abs/1908.06158

D. Precup, R. S. Sutton, and S. P. Singh, “Eligibility traces for offpolicy policy evaluation,” in Proceedings of the Seventeenth International Conference on Machine Learning, ser. ICML ’00. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2000, p.759–766.

W. Bendada, G. Salha, and T. Bontempelli, “Carousel personalization in music streaming apps with contextual bandits,” in Fourteenth ACM Conference on Recommender Systems, ser. RecSys ’20. New York, NY, USA: Association for Computing Machinery, 2020, p. 420–425. [Online]. Available: https://doi.org/10.1145/3383313.3412217

L. Tang, Y. Jiang, L. Li, C. Zeng, and T. Li, “Personalized recommendation via parameter­free contextual bandits,” in Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, ser. SIGIR ’15. New York, NY, USA: Association for Computing Machinery, 2015, p. 323–332. [Online]. Available: https://doi.org/10.1145/2766462.2767707

C. Zeng, Q. Wang, S. Mokhtari, and T. Li, “Online contextaware recommendation with time varying multi­armed bandit,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ser. KDD ’16. New York, NY, USA: Association for Computing Machinery, 2016, p. 2025–2034. [Online]. Available: https://doi.org/10.1145/2939672.2939878

D. Abensur, I. Balashov, S. Bar, I. Orlov, R. Lempel, and N. Moscovici, “Productization Challenges of Contextual Multi­Armed Bandits,” arXiv:1907.04884v1 [cs.IR], 2019.


Refbacks

  • There are currently no refbacks.


Abava  Absolutech Convergent 2020

ISSN: 2307-8162