On the Effectiveness of Offline RL for Dialogue Response Generation

Date: December 12, 2023

For language models, many methods using teacher forcing (TF) to train.

But with offline RL, which shows a clear performance improvement over teacher forcing while not inducing training instability or sacrificing practical training budgets.

Powerpoint for this talk

Reference Paper

Decision Transformer: Reinforcement Learning via Sequence Modeling
Offline Reinforcement Learning with Implicit Q-Learning
Proximal Policy Optimization Algorithms
A Learning Algorithm for Continually Running Fully Recurrent Neural Networks

Po-Chuan Chen

On the Effectiveness of Offline RL for Dialogue Response Generation

Powerpoint for this talk

Reference Paper

Share on

Leave a Comment