TY - JOUR TI - Methods of temporal differences for risk-averse dynamic programming and learning DO - https://doi.org/doi:10.7282/t3-5b3y-5c51 PY - 2020 AB - Stochastic sequential decision-making problems are generally modeled and solved as Markov decision processes. When the decision-makers are risk-averse, their risk-aversion can be incorporated into the model using dynamic risk-measures. Such risk-averse Markov decision processes can be theoretically solved by specialized dynamic programming methods. However, when the state space of the system becomes very large, then such methods become impractical. We consider reinforcement learning with performance evaluated by a dynamic risk measure for Markov decision processes. We use a linear value function approximation scheme and construct a projected risk-averse dynamic programming equation that involves this scheme. We study the properties of this equation. To solve this equation, we propose risk-averse counterparts of the methods of temporal differences and we prove their convergence with probability one. We also perform an empirical study on a complex transportation problem where we demonstrate that the risk-averse methods of temporal differences outperform the well known risk-neutral methods in terms of average profit over time. KW - Management LA - English ER -