Application of the Cross-Entropy Method to the Dynamic Assortment Optimization Problem

  • Jose M. Vera Aray ESPOL Polytechnic University, Escuela Superior Politécnica del Litoral, ESPOL, Facultad de Ciencias Naturales y Matemáticas (FCNM), Campus Gustavo Galindo Km. 30.5 Vía Perimetral P.O. Box 09-01-5863, Guayaquil - Ecuador

Resumen

This work considers an assortment optimization problem, under capacity constraint and unknown demand, where a retailer offers an assortment and observes the sale of one of the products according to a multinomial logit choice model. In this problem, named as the dynamic assortment optimization problem (DAOP), the retailer must offer different assortments in each period to learn the customer preferences. Therefore, the trade-off between exploration of new assortments and the exploitation of the best known assortment must be balanced. Similarities between sampling and exploration are established in order to apply the cross-entropy method as a policy for the solution of the DAOP. The cross-entropy method finds a probability distribution that samples an optimal solution by minimizing the cross-entropy between a target probability distribution and an arbitrarily selected probability distribution. This requires the DAOP to be formulated as a knapsack problem with a penalty for offering assortments that exceed capacity. The results are compared with adaptive exploration algorithms and, experimentally, the cross-entropy method shows competitive results. These results suggest that the cross-entropy method can be used to solve other sequential decision-making problems.

Citas

Agrawal, S. (2019). Recent advances in multiarmed bandits for sequential decision making. INFORMS TutORials in Operations Research, 167-188. doi: 10.1287/educ.2019.0204

Agrawal, S., Avadhanula, V., Goyal, V., & Zeevi, A. (2017). Thompson sampling for the mnl-bandit. Retrieved from http://arxiv.org/abs/1706.00977

Agrawal, S., Avadhanula, V., Goyal, V., & Zeevi, A. (2019). Mnl-bandit: A dynamic learning approach to assortment selection. Operations Research, 67(5), 1453–1485. doi: 10.1287/opre.2018.1832

Botev, Z. I., Kroese, D. P., Rubinstein, R. Y., & L’Ecuyer, P. (2013). The cross- entropy method for optimization from estimation to optimization. Handbook of statistics, 35–59. doi: 10.1016/j.memsci.2013.11.020

Caro, F., & Gallien, J. (2007). Dynamic assortment with demand learning for seasonal consumer goods. Management Science, 53(2), 276-292. doi: 10.1287/mnsc.1060.0613

Dua, D., & Graff, C. (2019). Uci machine learning repository. Retrieved from http://archive.ics.uci.edu/ml/datasets/car+evaluation.

Kök, A. G., & Marshall, L. F. (2007). Demand estimation and assortment optimiza- tion under substitution: Methodology and application. Operations Research, 55(6), 1001-1021. doi: 10.1287/opre.1070.0409

Lattimore, T., & Szepesva´ri, C. (2020). Bandit algorithms. Cambridge University Press.

Rubinstein, R. Y., & Kroese, D. P. (2004). The cross-entropy method: A unified approach to combinatorial optimization, monte-carlo simulation, and machine learning. Springer Science & Business Media. doi: 10.1007/
978-1-4757-4321-0

Rusmevichientong, P., Shen, Z. M., & Shmoys, D. B. (2010). Dynamic assortment optimization with a multinomial logit choice model and capacity constraint. Operations Research, 58(6), 1666-1680. doi: 10.1287/opre.1100.0866

Rusmevichientong, P., & Topaloglu, H. (2012). Robust assortment optimization in revenue management under the multinomial logit choice model. Operations Research, 60(4), 865-882. doi: 10.1287/opre.1120.1063

Sauré, D., & Zeevi, A. (2013). Optimal dynamic assortment planning with demand learning. Manufacturing & Service Operations Management, 15(3), 387-404. doi: 10.1287/msom.2013.0429

Slivkins, A. (2019). Introduction to multi-armed bandits. Foundations and Trends R in Machine Learning, 12(1-2), 1-286. doi: 10.1561/2200000068

Wang, Y., Chen, X., & Zhou, Y. (2018). Near-optimal policies for dynamic multinomial logit assortment selection models. In S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, & R. Garnett (Eds.), Advances in neural information processing systems 31 (p. 3101-3110). Curran Associates, Inc. Retrieved from http://papers.nips.cc/paper/7573-near-optimal-policies-for-dynamic-mu
Publicado
2020-06-26
Sección
Articulos