GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Date:
Nowadays, Generative Pre-trained Transformer models has not only breakthrough performance across complex language modelling tasks, but also by their extremely high computational and storage costs.
While there is emerging work on relieving this pressure via model compression, the applicability and performance of existing compression techniques is limited by the scale and complexity of GPT models.
In this paper, they propose GPTQ, a new one-shot weight quantization method based on approximate second-order information.
Leave a Comment