Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables
Date:
Previous methods rely heavily on on-policy experience, limiting their sample efficiency.
They also lack mechanisms to reason about task uncertainty when adapting to new tasks, limiting their effectiveness in sparse reward problems.
This paper developing an off-policy meta-RL algorithm that disentangles task inference and control.
- Achieving excellent sample efficiency during meta-training, enables fast adaptation by accumulating experience online
- Performing structured exploration by reasoning about uncertainty over tasks
Video from the author
GTC 2020: Efficient Meta-Reinforcement Learning via Probabilistic Context Variables
Reference paper
- Soft Actor-Critic
- $\rm{RL}^{2}$
- Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
- MAESN
Leave a Comment