December 01, 2022
Lab Seminar, Hsinchu city, National Yang Ming Chiao Tung University, Hsinchu city, Taiwan
Previous methods rely heavily on on-policy experience, limiting their sample efficiency.
They also lack mechanisms to reason about task uncertainty when adapting to new tasks, limiting their effectiveness in sparse reward problems.
This paper developing an off-policy meta-RL algorithm that disentangles task inference and control.
- Achieving excellent sample efficiency during meta-training, enables fast adaptation by accumulating experience online
- Performing structured exploration by reasoning about uncertainty over tasks