Distributionally Robust Partially Observable Markov Decision Process with Moment-based Ambiguity


We consider a distributionally robust (DR) formulation of partially observable Markov decision process (POMDP), where the transition probabilities and observation probabilities are random and unknown, only revealed at the end of every time step. We construct the ambiguity set of the joint distribution of the two types of probabilities using moment information bounded via conic constraints and show that the value function of DR-POMDP is convex with respect to the belief state. We propose a heuristic search value iteration method to solve DR-POMDP, which finds lower and upper bounds of the optimal value function. Computational analysis is conducted to compare DR-POMDP with the standard POMDP using random instances of dynamic machine repair and a ROCKSAMPLE benchmark.

MIDAS Network Members