Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables

Date:

Previous methods rely heavily on on-policy experience, limiting their sample efficiency.

They also lack mechanisms to reason about task uncertainty when adapting to new tasks, limiting their effectiveness in sparse reward problems.

This paper developing an off-policy meta-RL algorithm that disentangles task inference and control.

  1. Achieving excellent sample efficiency during meta-training, enables fast adaptation by accumulating experience online
  2. Performing structured exploration by reasoning about uncertainty over tasks

Video from the author

GTC 2020: Efficient Meta-Reinforcement Learning via Probabilistic Context Variables

Reference paper

  1. Soft Actor-Critic
  2. $\rm{RL}^{2}$
  3. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
  4. MAESN

Powerpoint for this talk

Powerpoint for this talk

Leave a Comment