TY - JOUR
TI - Methods of temporal differences for risk-averse dynamic programming and learning
DO - https://doi.org/doi:10.7282/t3-5b3y-5c51
PY - 2020
AB - Stochastic sequential decision-making problems are generally modeled and solved as Markov decision processes. When the decision-makers are risk-averse, their risk-aversion can be incorporated into the model using dynamic risk-measures. Such risk-averse Markov decision processes can be theoretically solved by specialized dynamic programming methods. However, when the state space of the system becomes very large, then such methods become impractical.
We consider reinforcement learning with performance evaluated by a dynamic risk measure for Markov decision processes. We use a linear value function approximation scheme and construct a projected risk-averse dynamic programming equation that involves this scheme. We study the properties of this equation. To solve this equation, we propose risk-averse counterparts of the methods of temporal differences and we prove their convergence with probability one. We also perform an empirical study on a complex transportation problem where we demonstrate that the risk-averse methods of temporal differences outperform the well known risk-neutral methods in terms of average profit over time.
KW - Management
LA - English
ER -