DescriptionWe consider the dilemma of taking sequential action within a nebulous and costly stochastic system. In such problems, the decision-maker sequentially takes an action from a given set, then incurs a cost and observes a response depending stochastically on the action. Confronted with an unknown system, the decision-maker must learn about the system by experimenting with risky actions, thus enabling better decisions over time. We thus consider the {risk-averse optimal learning} problem to dynamically choose actions to minimize the risk of the cumulative costs of learning. Motivated by problems in clinical trial design for novel pharmaceutical agents, we formulate the problem of Bayesian statistical inference under binary response as a Markov decision process with belief states. We formulate a certain class of standardized logistic models with quantile parameterizations and offer some general conditions under which belief states satisfy stochastic order and log-concavity under Bayesian dynamics. We also establish some stronger results under assumptions on the policy class. We then introduce dynamic Markov risk measures, formulate dynamic programming equations, and discuss the challenges of their solution. We then offer an approximate DP (ADP) schema based on a coarse grid approximation within a parameterized distribution family utilizing log-concavity constraints. We also study risk-averse lookahead policies, introducing a {robust-response} policy and a heuristic policy. We compare the performance of the above policy classes to the state-of-the-art, and demonstrate its performance in computational experiments, including the design of dose-escalation policies for three chemotherapeutic agents (bleomycin, etoposide, 5-fluorouracil). The robust-response policy exhibits strong peformance in the problem class, clarifing the role of risk measures under Bayesian belief dynamics and suggesting avenues of future research.