ERL-MR: Harnessing the Power of Euler Feature Representations for Balanced Multi-modal Learning

  • Weixiang Han
  • , Chengjun Cai
  • , Yu Guo
  • , Jialiang Peng*
  • *Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Multi-modal learning leverages data from diverse perceptual media to obtain enriched representations, thereby empowering machine learning models to complete more complex tasks. However, recent research results indicate that multi-modal learning still suffers from "modality imbalance '': Certain modalities' contributions are suppressed by dominant ones, consequently constraining the overall performance enhancement of multimodal learning. To tackle this issue, current approaches attempt to mitigate modality competition in various ways, but their effectiveness is still limited. To this end, we propose an Euler Representation Learning-based Modality Rebalance (ERL-MR) strategy, which reshapes the underlying competitive relationships between modalities into mutually reinforcing win-win situations while maintaining stable feature optimization directions. Specifically, ERL-MR employs Euler's formula to map original features to complex space, constructing cooperatively enhanced non-redundant features for each modality, which helps reverse the situation of modality competition. Moreover, to counteract the performance degradation resulting from optimization drift among modalities, we propose a Multi-Modal Constrained (MMC) loss based on cosine similarity of complex feature phase and cross-entropy loss of individual modalities, guiding the optimization direction of the fusion network. Extensive experiments conducted on four multi-modal multimedia datasets and two task-specific multi-modal multimedia datasets demonstrate the superiority of our ERL-MR strategy over state-of-the-art baselines, achieving modality rebalancing and further performance improvements.

Original languageEnglish
Title of host publicationMM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia
PublisherAssociation for Computing Machinery, Inc
Pages4591-4600
Number of pages10
ISBN (Electronic)9798400706868
DOIs
StatePublished - 28 Oct 2024
Event32nd ACM International Conference on Multimedia, MM 2024 - Melbourne, Australia
Duration: 28 Oct 20241 Nov 2024

Publication series

NameMM 2024 - Proceedings of the 32nd ACM International Conference on Multimedia

Conference

Conference32nd ACM International Conference on Multimedia, MM 2024
Country/TerritoryAustralia
CityMelbourne
Period28/10/241/11/24

Keywords

  • euler formula
  • modality imbalance
  • multi-modal constrained loss
  • multi-modal learning

Fingerprint

Dive into the research topics of 'ERL-MR: Harnessing the Power of Euler Feature Representations for Balanced Multi-modal Learning'. Together they form a unique fingerprint.

Cite this