DescriptionWe develop the asymptotic variance for Markov decision processes. Results are provided to express the asymptotic variance of a w-geometrically ergodic Markov reward process on a Borel space with randomized rewards. These results are then applied to problems minimizing the asymptotic variance over the space of randomized Markovian policies, which is generally a non-convex optimization problem due to the nonlinearity of the asymptotic variance. Dynamic programming algorithms and bilinear programming formulations are provided for variations of this task. The second part of this work develops reinforcement learning algorithms for this objective. Temporal-differences based estimators are developed, which are then applied to develop an actor-critic algorithm for mean-variance optimization.