Regret Minimization in Partially Observable Linear Quadratic Control

Abstract

We study the problem of regret minimization in partially observable linear quadratic control systems when the model dynamics are unknown a priori. We propose ExpCommit, an explore-then-commit algorithm that learns the model Markov parameters and then follows the principle of optimism in the face of uncertainty to design a controller. We propose a novel way to decompose the regret and provide an end-to-end sublinear regret upper bound for partially observable linear quadratic control. Finally, we provide stability guarantees and establish a regret upper bound of $\mathcalO(T^2/3)$ for ExpCommit, where $T$ is the time horizon of the problem.

BibTeX key: lale2020regret
entry type: article
year: 2020
url: http://arxiv.org/abs/2002.00082
note: cite arxiv:2002.00082

BibSonomy

Regret Minimization in Partially Observable Linear Quadratic Control

Abstract

Tags

Users

Comments and Reviewsshow / hide

Cite this publication

More citation styles

search on