Quark: Controllable Text Generation with Reinforced [Un]learning
Date:
Large language models may generate content that is misaligned with the user’s expectations. For example, generating toxic words, repeated content, and undesired responses for users.
This paper addresses this challenge with an algorithm for optimizing a reward function that quantifies an (un)wanted property.
Leave a Comment