Is Reinforcement Learning (Not) for Natural Language Processing

Date:

In this paper, they first introduce an open-source modular library RL4LMs, for optimizing language generators with RL.

Also, they present the GRUE (General Reinforced-language Understanding Evaluation) benchmark, the benchmark uses reward functions which capture automated measures of human preference.

After that, they introduce an easy-to-use, performant RL algorithm, NLPO (Natural Language Policy Optimization) that learns to effectively reduce the combinatorial action space in language generation.

Powerpoint for this talk

Powerpoint for this talk

Reference Paper

Leave a Comment