跳到主要导航 跳到搜索 跳到主要内容

Exploring Recommender System Evaluation: A Multi-Modal LLM Agent Framework for A/B Testing

  • Wenlin Zhang
  • , Xiangyang Li
  • , Qiyuan Ge
  • , Kuicai Dong
  • , Pengyue Jia
  • , Xiaopeng Li
  • , Zijian Zhang
  • , Maolin Wang
  • , Yichao Wang*
  • , Huifeng Guo
  • , Ruiming Tang
  • , Xiangyu Zhao*
  • *此作品的通讯作者
  • City University of Hong Kong
  • Huawei Technologies Co., Ltd.
  • Jilin University

科研成果: 书/报告/会议事项章节会议稿件同行评审

摘要

diningIn recommender systems, online A/B testing is a crucial method for evaluating the performance of different models. However, conducting online A/B testing often presents significant challenges, including substantial economic costs, user experience degradation, and considerable time requirements. With the Large Language Models' powerful capacity, LLM-based agent shows great potential to replace traditional online A/B testing. Nonetheless, current agents fail to simulate the perception process and interaction patterns, due to the lack of real environments and visual perception capability. To address these challenges, we introduce a multi-modal user agent for A/B testing (A/B Agent). Specifically, we construct a recommendation sandbox environment for A/B testing, enabling multimodal and multi-page interactions that align with real user behavior on online platforms. The designed agent leverages multimodal information perception, fine-grained user preferences, and integrates profiles, action memory retrieval, and a fatigue system to simulate complex human decision-making. We validated the potential of the agent as an alternative to traditional A/B testing from three perspectives: model, data, and features. Furthermore, we found that the data generated by A/B Agent can effectively enhance the capabilities of recommendation models. Our code is publicly available at https://github.com/Applied-Machine-Learning-Lab/ABAgent.

源语言英语
主期刊名KDD 2026 - Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1
出版商Association for Computing Machinery
2878-2889
页数12
ISBN(电子版)9798400722585
DOI
出版状态已出版 - 20 4月 2026
已对外发布
活动32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1, KDD 2026 - Jeju Island, 韩国
期限: 9 8月 202613 8月 2026

出版系列

姓名Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
1-A
ISSN(印刷版)2154-817X

会议

会议32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1, KDD 2026
国家/地区韩国
Jeju Island
时期9/08/2613/08/26

指纹

探究 'Exploring Recommender System Evaluation: A Multi-Modal LLM Agent Framework for A/B Testing' 的科研主题。它们共同构成独一无二的指纹。

引用此