GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Date: October 24, 2023

Nowadays, Generative Pre-trained Transformer models has not only breakthrough performance across complex language modelling tasks, but also by their extremely high computational and storage costs.

While there is emerging work on relieving this pressure via model compression, the applicability and performance of existing compression techniques is limited by the scale and complexity of GPT models.

In this paper, they propose GPTQ, a new one-shot weight quantization method based on approximate second-order information.

Powerpoint for this talk

Reference Paper

Optimal Brain Compression: A Framework for Accurate Post-Training Quantization and Pruning
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
OPT: Open Pre-trained Transformer Language Models

Po-Chuan Chen

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Powerpoint for this talk

Reference Paper

Share on

Leave a Comment