Bounded Policy Synthesis for POMDPs with Safe-Reachability and Quantitative Objectives
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Robots are being deployed for many real-world applications like autonomous driving, disaster rescue, and personal assistance. Effectively planning robust executions under uncertainty is critical for building these autonomous robots. Partially Observable Markov Decision Processes (POMDPs) provide a standard approach to model many robot applications under uncertainty. A key algorithmic problem for POMDPs is the synthesis of policies that specify the actions to take contingent on all possible events. Policy synthesis for POMDPs with two kinds of objectives is considered in this thesis: (1) boolean objectives for a correctness guarantee of accomplishing tasks and (2) quantitative objectives for optimal behaviors. For boolean objectives, this thesis focuses on a common safe-reachability objective: with a probability above a threshold, a goal state is eventually reached while keeping the probability of visiting unsafe states below a different threshold. Previous results have shown that policy synthesis for POMDPs over infinite horizon is generally undecidable. For decidability, this thesis focuses on POMDPs over a bounded horizon. Solving POMDPs requires reasoning over a vast space of beliefs (probability distributions). To address this, this thesis introduces the notion of a goal-constrained belief space that only contains beliefs reachable under desired executions that can achieve the safe-reachability objectives. Based on this notion, this thesis presents an offline approach that constructs policies over the goal-constrained belief space instead of the entire belief space. Simulation experiments show that this offline approach can scale to large belief spaces by focusing on the goal-constrained belief space. A full policy is generally costly to compute. To improve efficiency, this thesis presents an online approach that interleaves the computation of partial policies and execution. A partial policy is parameterized by a replanning probability and only contain a sampled subset of all possible events. This online approach allows users to specify an appropriate bound on the replanning probability to balance efficiency and correctness. Finally, this thesis presents an approximate policy synthesis approach that combines the safe-reachability objectives with the quantitative objectives. The results demonstrate that the constructed policies not only achieve the safe-reachability objective but also are of high quality concerning the quantitative objective.
Description
Advisor
Degree
Type
Keywords
Citation
Wang, Yue. "Bounded Policy Synthesis for POMDPs with Safe-Reachability and Quantitative Objectives." (2018) Diss., Rice University. https://hdl.handle.net/1911/105878.