Off-Policy Deep Reinforcement Learning without Exploration

Date: May 08, 2023

This paper proposes a new algorithm for off-policy reinforcement learning that combines state-of-the-art deep Q-learning algorithms with a state-conditioned generative model for producing only previously seen actions.

Their algorithm, Batch-Constrained deep Q-learning (BCQ), uses the generative model to propose candidate actions with high similarity to the batch, and then selects the highest valued action through a learned Q-network.

OpenReview for this presentation

NYCU RL Theory Challenge 2023

Powerpoint for this talk

Reference Paper

Reinforcement learning: An introduction.
Deep reinforcement learning with double q-learning
Dueling Network Architectures for Deep Reinforcement Learning
Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor
Trust region policy optimization
Learning From Delayed Rewards
Deep Reinforcement Learning with Double Q-learning

Po-Chuan Chen

Off-Policy Deep Reinforcement Learning without Exploration

OpenReview for this presentation

Powerpoint for this talk

Reference Paper

Share on

Leave a Comment