Per-Decision Option Discounting

Date June 9, 2019
Authors Anna Harutyunyan (DeepMind), Peter Vrancx, Philippe Hamel (DeepMind), Ann Nowe (Vrije Universiteit Brussel), Doina Precup (DeepMind)

In order to solve complex problems, an agent must be able to reason over a sufficiently long horizon. Temporal abstraction, commonly modelled through options, offers the ability to reason at many time scales, but the horizon length is still determined by the single discount factor of the underlying Markov Decision Process. We propose a modification to the options framework that allows the agent’s horizon to grow naturally as its actions become more complex and extended in time. We show that the proposed option-step discount controls a bias-variance trade-off, with larger discounts (counter-intuitively) leading to less estimation variance.

View the paper