Skip to main navigation Skip to search Skip to main content

Exploring Recommender System Evaluation: A Multi-Modal LLM Agent Framework for A/B Testing

  • Wenlin Zhang
  • , Xiangyang Li
  • , Qiyuan Ge
  • , Kuicai Dong
  • , Pengyue Jia
  • , Xiaopeng Li
  • , Zijian Zhang
  • , Maolin Wang
  • , Yichao Wang*
  • , Huifeng Guo
  • , Ruiming Tang
  • , Xiangyu Zhao*
  • *Corresponding author for this work
  • City University of Hong Kong
  • Huawei Technologies Co., Ltd.
  • Jilin University

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

diningIn recommender systems, online A/B testing is a crucial method for evaluating the performance of different models. However, conducting online A/B testing often presents significant challenges, including substantial economic costs, user experience degradation, and considerable time requirements. With the Large Language Models' powerful capacity, LLM-based agent shows great potential to replace traditional online A/B testing. Nonetheless, current agents fail to simulate the perception process and interaction patterns, due to the lack of real environments and visual perception capability. To address these challenges, we introduce a multi-modal user agent for A/B testing (A/B Agent). Specifically, we construct a recommendation sandbox environment for A/B testing, enabling multimodal and multi-page interactions that align with real user behavior on online platforms. The designed agent leverages multimodal information perception, fine-grained user preferences, and integrates profiles, action memory retrieval, and a fatigue system to simulate complex human decision-making. We validated the potential of the agent as an alternative to traditional A/B testing from three perspectives: model, data, and features. Furthermore, we found that the data generated by A/B Agent can effectively enhance the capabilities of recommendation models. Our code is publicly available at https://github.com/Applied-Machine-Learning-Lab/ABAgent.

Original languageEnglish
Title of host publicationKDD 2026 - Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1
PublisherAssociation for Computing Machinery
Pages2878-2889
Number of pages12
ISBN (Electronic)9798400722585
DOIs
StatePublished - 20 Apr 2026
Externally publishedYes
Event32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1, KDD 2026 - Jeju Island, Korea, Republic of
Duration: 9 Aug 202613 Aug 2026

Publication series

NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
Volume1-A
ISSN (Print)2154-817X

Conference

Conference32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1, KDD 2026
Country/TerritoryKorea, Republic of
CityJeju Island
Period9/08/2613/08/26

Keywords

  • a/b testing
  • multimodal user agent
  • recommender system

Fingerprint

Dive into the research topics of 'Exploring Recommender System Evaluation: A Multi-Modal LLM Agent Framework for A/B Testing'. Together they form a unique fingerprint.

Cite this