跳到主要导航 跳到搜索 跳到主要内容

Large Multimodal Model Compression via Iterative Efficient Pruning and Distillation

  • Maolin Wang
  • , Yao Zhao
  • , Jiajia Liu
  • , Jingdong Chen
  • , Chenyi Zhuang*
  • , Jinjie Gu
  • , Ruocheng Guo
  • , Xiangyu Zhao*
  • *此作品的通讯作者
  • City University of Hong Kong
  • Ant Group
  • ByteDance Research

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

The deployment of Large Multimodal Models (LMMs) within Ant Group has significantly advanced multimodal tasks in payment, security, and advertising, notably enhancing advertisement audition tasks in Alipay. However, the deployment of such sizable models introduces challenges, particularly in increased latency and carbon emissions, which are antithetical to the ideals of Green AI. This paper introduces a novel multi-stage compression strategy for our proprietary LLM, AntGMM. Our methodology pivots on three main aspects: employing small training sample sizes, addressing multi-level redundancy through multi-stage pruning, and introducing an advanced distillation loss design. In our research, we constructed a dataset, the Multimodal Advertisement Audition Dataset (MAAD), from real-world scenarios within Alipay, and conducted experiments to validate the reliability of our proposed strategy. Furthermore, the effectiveness of our strategy is evident in its operational success in Alipay’s real-world multimodal advertisement audition for three months from September 2023. Notably, our approach achieved a substantial reduction in latency, decreasing it from 700ms to 90ms, while maintaining online performance with only a slight performance decrease. Moreover, our compressed model is estimated to reduce electricity consumption by approximately 75 million kWh annually compared to the direct deployment of AntGMM, demonstrating our commitment to green AI initiatives.

源语言英语
主期刊名WWW 2024 Companion - Companion Proceedings of the ACM Web Conference
出版商Association for Computing Machinery, Inc
235-244
页数10
ISBN(电子版)9798400701726
DOI
出版状态已出版 - 13 5月 2024
已对外发布
活动33rd Companion of the ACM World Wide Web Conference, WWW 2023 - Singapore, 新加坡
期限: 13 5月 202417 5月 2024

出版系列

姓名WWW 2024 Companion - Companion Proceedings of the ACM Web Conference

会议

会议33rd Companion of the ACM World Wide Web Conference, WWW 2023
国家/地区新加坡
Singapore
时期13/05/2417/05/24

联合国可持续发展目标

此成果有助于实现下列可持续发展目标:

  1. 可持续发展目标 7 - 经济适用的清洁能源
    可持续发展目标 7 经济适用的清洁能源

指纹

探究 'Large Multimodal Model Compression via Iterative Efficient Pruning and Distillation' 的科研主题。它们共同构成独一无二的指纹。

引用此