Is Reinforcement Learning (Not) for Natural Language Processing

Date: April 18, 2023

In this paper, they first introduce an open-source modular library RL4LMs, for optimizing language generators with RL.

Also, they present the GRUE (General Reinforced-language Understanding Evaluation) benchmark, the benchmark uses reward functions which capture automated measures of human preference.

After that, they introduce an easy-to-use, performant RL algorithm, NLPO (Natural Language Policy Optimization) that learns to effectively reduce the combinatorial action space in language generation.

Powerpoint for this talk

Reference Paper

Proximal Policy Optimization Algorithms
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Extracting Training Data from Large Language Models

Po-Chuan Chen

Is Reinforcement Learning (Not) for Natural Language Processing

Powerpoint for this talk

Reference Paper

Share on

Leave a Comment