Skip to main navigation Skip to search Skip to main content

Large Multimodal Model Compression via Iterative Efficient Pruning and Distillation

  • Maolin Wang
  • , Yao Zhao
  • , Jiajia Liu
  • , Jingdong Chen
  • , Chenyi Zhuang*
  • , Jinjie Gu
  • , Ruocheng Guo
  • , Xiangyu Zhao*
  • *Corresponding author for this work
  • City University of Hong Kong
  • Ant Group
  • ByteDance Research

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

The deployment of Large Multimodal Models (LMMs) within Ant Group has significantly advanced multimodal tasks in payment, security, and advertising, notably enhancing advertisement audition tasks in Alipay. However, the deployment of such sizable models introduces challenges, particularly in increased latency and carbon emissions, which are antithetical to the ideals of Green AI. This paper introduces a novel multi-stage compression strategy for our proprietary LLM, AntGMM. Our methodology pivots on three main aspects: employing small training sample sizes, addressing multi-level redundancy through multi-stage pruning, and introducing an advanced distillation loss design. In our research, we constructed a dataset, the Multimodal Advertisement Audition Dataset (MAAD), from real-world scenarios within Alipay, and conducted experiments to validate the reliability of our proposed strategy. Furthermore, the effectiveness of our strategy is evident in its operational success in Alipay’s real-world multimodal advertisement audition for three months from September 2023. Notably, our approach achieved a substantial reduction in latency, decreasing it from 700ms to 90ms, while maintaining online performance with only a slight performance decrease. Moreover, our compressed model is estimated to reduce electricity consumption by approximately 75 million kWh annually compared to the direct deployment of AntGMM, demonstrating our commitment to green AI initiatives.

Original languageEnglish
Title of host publicationWWW 2024 Companion - Companion Proceedings of the ACM Web Conference
PublisherAssociation for Computing Machinery, Inc
Pages235-244
Number of pages10
ISBN (Electronic)9798400701726
DOIs
StatePublished - 13 May 2024
Externally publishedYes
Event33rd Companion of the ACM World Wide Web Conference, WWW 2023 - Singapore, Singapore
Duration: 13 May 202417 May 2024

Publication series

NameWWW 2024 Companion - Companion Proceedings of the ACM Web Conference

Conference

Conference33rd Companion of the ACM World Wide Web Conference, WWW 2023
Country/TerritorySingapore
CitySingapore
Period13/05/2417/05/24

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

  1. SDG 7 - Affordable and Clean Energy
    SDG 7 Affordable and Clean Energy

Keywords

  • Distillation
  • Efficient Inference
  • Large Language Model
  • Large Multimodal Model
  • Model Compression
  • Pruning

Fingerprint

Dive into the research topics of 'Large Multimodal Model Compression via Iterative Efficient Pruning and Distillation'. Together they form a unique fingerprint.

Cite this