Per-Decision Option Discounting
In order to solve complex problems, an agent must be able to reason over a sufficiently long horizon. Temporal abstraction, commonly modelled through options, offers the ability to reason at many time scales, but the horizon length is still determined by the single discount factor of the underlying Markov Decision Process. We propose a modification to the options framework that allows the agent’s horizon to grow naturally as its actions become more complex and extended in time. We show that the proposed option-step discount controls a bias-variance trade-off, with larger discounts (counter-intuitively) leading to less estimation variance.