On the Effectiveness of Offline RL for Dialogue Response Generation
Date:
For language models, many methods using teacher forcing (TF) to train.
But with offline RL, which shows a clear performance improvement over teacher forcing while not inducing training instability or sacrificing practical training budgets.
Leave a Comment