Efficient Off-Policy Meta-Reinforcement Learning via Probabilistic Context Variables

Date: December 01, 2022

Previous methods rely heavily on on-policy experience, limiting their sample efficiency.

They also lack mechanisms to reason about task uncertainty when adapting to new tasks, limiting their effectiveness in sparse reward problems.

This paper developing an off-policy meta-RL algorithm that disentangles task inference and control.

Achieving excellent sample efficiency during meta-training, enables fast adaptation by accumulating experience online
Performing structured exploration by reasoning about uncertainty over tasks

Video from the author

Reference paper

Powerpoint for this talk