×

Technical note – Joint learning and optimization of multi-product pricing with finite resource capacity and unknown demand parameters. (English) Zbl 1475.91100

The authors develop two pricing heuristics to solve the problem of joint learning and pricing in a network revenue management (NRM) setting where a monopolist seller maximizes expected revenue by selling \(n\) types of products during a finite selling season subject to constraints imposed by \(m\) limited resources which cannot be replenished during the selling season and have zero salvage value at the end of the selling season, and the seller needs to decide the price for each product at the beginning of every decision period throughout the selling horizon. With \(k\) denoting the scaling factor – that is, for \(k=1,2,\dots\), in the \(k^{\textrm{th}}\) problem the selling horizon is divided into \(k\cdot T\) decision periods, and the initial capacity levels are given by \(k\cdot C\) where \(T \in \mathbb{N}\) is the base number of decision periods and \(C\in \mathbb{R}_{+}^{m}\) is the base initial capacity levels – the first pricing heuristic, called parametric self-adjusting control (PSC), achieves a rate-optimal \(O(\sqrt{k})\) regret bound for the NRM setting with a general parametric demand model and a continuum set of feasible price vectors. The second heuristic, called accelerated parametric self-adjusting control (APSC), deals with the NRM setting with a parametric demand model which also satisfies an extra well-separated condition and achieves an \(O(\log^{2}(k))\) regret bound. The regret is defined as the maximal revenue a seller could obtain who knows the demand function if there were no randomness in the demand realizations, minus the expected revenue under a heuristic pricing control.
In addition to standard smoothness assumptions, the authors also assume there exist explorations prices which induce different demands for different model parameters and the seller is able to use the maximum likelihood (ML) estimator to statistically identify the true parameters of the demand function.
In PSC, the seller uses the exploration prices for \(L = \lceil \sqrt{k\cdot T}\rceil\) periods, called the exploration stage, then estimates the parameters \(\theta\) of the demand function using the ML estimator, and in subsequent periods, called the exploitation stage, the prices are adjusted to achieve the demand rate corresponding to \(\theta\).
For APSC, the assumption of “well-separated demand” is made implying that for a range of prices \(p\) the demand function \(\lambda\) satisfies \(\|\lambda(p,\theta) - \lambda(p,\theta')\|_{2} \geq c \cdot \|\theta - \theta'\|_{2}\). In addition to the steps of PSC, APSC repeatedly re-estimates \(\theta\) in the exploitation stage; and to avoid the computationally intensive calculation of the demand function corresponding to \(\theta\), only its approximation is sought for that takes into account the estimation error in \(\theta\).

MSC:

91B24 Microeconomic theory (price theory and economic markets)
90C25 Convex programming
90C31 Sensitivity, stability, parametric optimization
91B42 Consumer behavior, demand theory
PDFBibTeX XMLCite
Full Text: DOI

References:

[1] Araman V, Caldentey R (2009) Dynamic pricing for nonperishable products with demand learning. Oper. Res. 57(5):1169-1188.Link, Google Scholar · Zbl 1233.91099
[2] Aviv Y, Pazgal A (2005) Dynamic pricing of short life-cycle products through active learning. Working paper, Coller School of Management, Tel Aviv University, Tel Aviv-Yafo, Israel.Google Scholar
[3] Badanidiyuru A, Kleinberg R, Slivkins A (2018) Bandits with knapsacks. J. ACM 65(3):13.Crossref, Google Scholar · Zbl 1425.68340 · doi:10.1145/3164539
[4] Besbes O, Zeevi A (2009) Dynamic pricing without knowing the demand function: Risk bound and near-optimal algorithms. Oper. Res. 57(6):1407-1420.Link, Google Scholar · Zbl 1233.90011
[5] Besbes O, Zeevi A (2012) Blind network revenue management. Oper. Res. 60(6):1537-1550.Link, Google Scholar · Zbl 1263.90016
[6] Boyd S, Vandenberghe L (2004) Convex Optimization (Cambridge University Press, Cambridge, UK).Crossref, Google Scholar · Zbl 1058.90049 · doi:10.1017/CBO9780511804441
[7] Broder J, Rusmevichientong P (2012) Dynamic pricing under a general parametric choice model. Oper. Res. 60(4):965-980.Link, Google Scholar · Zbl 1260.91094
[8] Chen N, Gallego G (2019) A primal-dual learning algorithm for personalized dynamic pricing with an inventory constraint. Working paper, Rotman School of Management, University of Toronto, Toronto, Canada.Google Scholar
[9] Chen QG, Jasin S, Duenyas I (2019) Nonparametric self-adjusting control for joint learning and optimization of multi-product pricing with finite resource capacity. Math. Oper. Res. 44(2):601-631.Link, Google Scholar · Zbl 1442.90096
[10] Chen Y, Farias VF (2013) Simple policies for dynamic pricing with imperfect forecasts. Oper. Res. 61(3):612-624.Link, Google Scholar · Zbl 1273.91187
[11] den Boer AV (2015) Dynamic pricing and learning: Historical origins, current research, and new directions. Surveys Oper. Res. Management Sci. 20(1):1-18.Crossref, Google Scholar · doi:10.1016/j.sorms.2015.03.001
[12] Farias VF, van Roy B (2010) Dynamic pricing with a prior on market response. Oper. Res. 58(1):16-29.Link, Google Scholar · Zbl 1233.91104
[13] Ferreira KJ, Simchi-Levi D, Wang H (2018) Online network revenue management using Thompson sampling. Oper. Res. 66(6):1586-1602.Link, Google Scholar · Zbl 1446.90095
[14] Jasin S (2014) Reoptimization and self-adjusting price control for network revenue management. Oper. Res. 62(5):1168-1178.Link, Google Scholar · Zbl 1368.91102
[15] Koushik D, Higbie JA, Eister C (2012) Retail price optimization at Intercontinental Hotels Group. Interfaces 42(1):45-57.Link, Google Scholar
[16] Lei Y, Jasin S, Sinha A (2014) Near-optimal bisection search for nonparametric dynamic pricing with inventory constraint. Working paper, Smith School of Business, Queens University, Kingston, Canada.Google Scholar
[17] Pekgun P, Menich RP, Acharya S, Finch PG, Deschamps F, Mallery K, van Sistine J, Christianson K, Fuller J (2013) Carlson Rezidor Hotel Group maximizes revenue through improved demand management and price optimization. Interfaces 43(1):21-36.Link, Google Scholar
[18] Pronzato L, Pázman A (2013) Design of Experiments in Nonlinear Models: Asymptotic Normality, Optimality Criteria and Small-Sample Properties (Springer, New York).Crossref, Google Scholar · Zbl 1275.62026 · doi:10.1007/978-1-4614-6363-4
[19] Talluri K, van Ryzin G (2005) The Theory and Practice of Revenue Management (Springer, New York, USA).Crossref, Google Scholar · doi:10.1007/b139000
[20] Wang Z, Deng S, Ye Y (2014) Closing the gaps: A learning-while-doing algorithm for single-product revenue management problems. Oper. Res. 62(2):318-331.Link, Google Scholar · Zbl 1302.91100
This reference list is based on information provided by the publisher or from digital mathematics libraries. Its items are heuristically matched to zbMATH identifiers and may contain data conversion errors. In some cases that data have been complemented/enhanced by data from zbMATH Open. This attempts to reflect the references listed in the original paper as accurately as possible without claiming completeness or a perfect matching.