Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge

Cite

Text

Shen et al. "Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge." AAAI Conference on Artificial Intelligence, 2024. doi:10.1609/AAAI.V38I17.29860

Markdown

[Shen et al. "Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge." AAAI Conference on Artificial Intelligence, 2024.](https://mlanthology.org/aaai/2024/shen2024aaai-agile/) doi:10.1609/AAAI.V38I17.29860

BibTeX

@inproceedings{shen2024aaai-agile,
  title     = {{Agile-Quant: Activation-Guided Quantization for Faster Inference of LLMs on the Edge}},
  author    = {Shen, Xuan and Dong, Peiyan and Lu, Lei and Kong, Zhenglun and Li, Zhengang and Lin, Ming and Wu, Chao and Wang, Yanzhi},
  booktitle = {AAAI Conference on Artificial Intelligence},
  year      = {2024},
  pages     = {18944-18951},
  doi       = {10.1609/AAAI.V38I17.29860},
  url       = {https://mlanthology.org/aaai/2024/shen2024aaai-agile/}
}