Off-Policy Deep Reinforcement Learning without Exploration

Date:

This paper proposes a new algorithm for off-policy reinforcement learning that combines state-of-the-art deep Q-learning algorithms with a state-conditioned generative model for producing only previously seen actions.

Their algorithm, Batch-Constrained deep Q-learning (BCQ), uses the generative model to propose candidate actions with high similarity to the batch, and then selects the highest valued action through a learned Q-network.

OpenReview for this presentation

NYCU RL Theory Challenge 2023

Powerpoint for this talk

Powerpoint for this talk

Reference Paper

Leave a Comment