On the Effectiveness of Offline RL for Dialogue Response Generation

Date:

For language models, many methods using teacher forcing (TF) to train.

But with offline RL, which shows a clear performance improvement over teacher forcing while not inducing training instability or sacrificing practical training budgets.

Powerpoint for this talk

Reference Paper

Leave a Comment