DescriptionWe investigate a dynamic inventory control problem involving fixed setup costs and lost sales. Considering ambiguity, even the demand distribution $fequiv (f(d))_{d=0,1,...}$ itself, out of which random realizations are sampled, can come from a vast and definitely non-singleton set. Lost sales and demand ambiguity would together complicate the problem through censoring, namely, the inability of the firm to observe the lost portion of the demand. Our main policy idea advocates periodically ordering up to seemingly inadvisable high levels just to learn and in intervening periods, cleverly exploiting the information gained in these learning periods. By regret, we mean the price paid for ambiguity in long-run average performances. %The bulk of our efforts are devoted to the case where no known bound is known for realized demand levels. When demand has finite support, we can accomplish a regret bound in the order of $mathcal{O}(T^{2/3}cdot (ln T)^{1/2})$ which almost matches a known lower bound as long as inventory costs are genuinely convex. Major policy adjustments are warranted for the more complex case involving unbounded demand support, for which our regret bound is in the order of $mathcal{O}(T^{8/9})$. We find it necessary to separately treat the situation where $f(0)$ gets arbitrarily close to one, and $mathcal{O}(T^{(2+sqrt{2})/4})simeq mathcal{O}(T^{0.854})$ can be established if the firm is allowed to remove items --with immediate cost-- from the inventory. We also propose other policies based on the general learning-while-doing idea using the Kaplan-Meier (KM) estimator.
Our simulation demonstrates the merits of the various policy ideas and the hurdles posed by the prospect of $f(0)longrightarrow 1^-$.