Is Reinforcement Learning (Not) for Natural Language Processing
Date:
In this paper, they first introduce an open-source modular library RL4LMs, for optimizing language generators with RL.
Also, they present the GRUE (General Reinforced-language Understanding Evaluation) benchmark, the benchmark uses reward functions which capture automated measures of human preference.
After that, they introduce an easy-to-use, performant RL algorithm, NLPO (Natural Language Policy Optimization) that learns to effectively reduce the combinatorial action space in language generation.
Leave a Comment