Application of the Cross-Entropy Method to the Dynamic Assortment Optimization Problem
This work considers an assortment optimization problem, under capacity constraint and unknown demand, where a retailer offers an assortment and observes the sale of one of the products according to a multinomial logit choice model. In this problem, named as the dynamic assortment optimization problem (DAOP), the retailer must offer different assortments in each period to learn the customer preferences. Therefore, the trade-off between exploration of new assortments and the exploitation of the best known assortment must be balanced. Similarities between sampling and exploration are established in order to apply the cross-entropy method as a policy for the solution of the DAOP. The cross-entropy method finds a probability distribution that samples an optimal solution by minimizing the cross-entropy between a target probability distribution and an arbitrarily selected probability distribution. This requires the DAOP to be formulated as a knapsack problem with a penalty for offering assortments that exceed capacity. The results are compared with adaptive exploration algorithms and, experimentally, the cross-entropy method shows competitive results. These results suggest that the cross-entropy method can be used to solve other sequential decision-making problems.
Agrawal, S., Avadhanula, V., Goyal, V., & Zeevi, A. (2017). Thompson sampling for the mnl-bandit. Retrieved from http://arxiv.org/abs/1706.00977
Agrawal, S., Avadhanula, V., Goyal, V., & Zeevi, A. (2019). Mnl-bandit: A dynamic learning approach to assortment selection. Operations Research, 67(5), 1453–1485. doi: 10.1287/opre.2018.1832
Botev, Z. I., Kroese, D. P., Rubinstein, R. Y., & L’Ecuyer, P. (2013). The cross- entropy method for optimization from estimation to optimization. Handbook of statistics, 35–59. doi: 10.1016/j.memsci.2013.11.020
Caro, F., & Gallien, J. (2007). Dynamic assortment with demand learning for seasonal consumer goods. Management Science, 53(2), 276-292. doi: 10.1287/mnsc.1060.0613
Dua, D., & Graff, C. (2019). Uci machine learning repository. Retrieved from http://archive.ics.uci.edu/ml/datasets/car+evaluation.
Kök, A. G., & Marshall, L. F. (2007). Demand estimation and assortment optimiza- tion under substitution: Methodology and application. Operations Research, 55(6), 1001-1021. doi: 10.1287/opre.1070.0409
Lattimore, T., & Szepesva´ri, C. (2020). Bandit algorithms. Cambridge University Press.
Rubinstein, R. Y., & Kroese, D. P. (2004). The cross-entropy method: A unified approach to combinatorial optimization, monte-carlo simulation, and machine learning. Springer Science & Business Media. doi: 10.1007/
Rusmevichientong, P., Shen, Z. M., & Shmoys, D. B. (2010). Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Operations Research, 58(6), 1666-1680. doi: 10.1287/opre.1100.0866
Rusmevichientong, P., & Topaloglu, H. (2012). Robust assortment optimization in revenue management under the multinomial logit choice model. Operations Research, 60(4), 865-882. doi: 10.1287/opre.1120.1063
Sauré, D., & Zeevi, A. (2013). Optimal dynamic assortment planning with demand learning. Manufacturing & Service Operations Management, 15(3), 387-404. doi: 10.1287/msom.2013.0429
Slivkins, A. (2019). Introduction to multi-armed bandits. Foundations and Trends R in Machine Learning, 12(1-2), 1-286. doi: 10.1561/2200000068
Wang, Y., Chen, X., & Zhou, Y. (2018). Near-optimal policies for dynamic multinomial logit assortment selection models. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in neural information processing systems 31 (p. 3101-3110). Curran Associates, Inc. Retrieved from http://papers.nips.cc/paper/7573-near-optimal-policies-for-dynamic-mu