TY - GEN
T1 - Exploring Recommender System Evaluation
T2 - 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1, KDD 2026
AU - Zhang, Wenlin
AU - Li, Xiangyang
AU - Ge, Qiyuan
AU - Dong, Kuicai
AU - Jia, Pengyue
AU - Li, Xiaopeng
AU - Zhang, Zijian
AU - Wang, Maolin
AU - Wang, Yichao
AU - Guo, Huifeng
AU - Tang, Ruiming
AU - Zhao, Xiangyu
N1 - Publisher Copyright:
© 2026 Owner/Author.
PY - 2026/4/20
Y1 - 2026/4/20
N2 - diningIn recommender systems, online A/B testing is a crucial method for evaluating the performance of different models. However, conducting online A/B testing often presents significant challenges, including substantial economic costs, user experience degradation, and considerable time requirements. With the Large Language Models' powerful capacity, LLM-based agent shows great potential to replace traditional online A/B testing. Nonetheless, current agents fail to simulate the perception process and interaction patterns, due to the lack of real environments and visual perception capability. To address these challenges, we introduce a multi-modal user agent for A/B testing (A/B Agent). Specifically, we construct a recommendation sandbox environment for A/B testing, enabling multimodal and multi-page interactions that align with real user behavior on online platforms. The designed agent leverages multimodal information perception, fine-grained user preferences, and integrates profiles, action memory retrieval, and a fatigue system to simulate complex human decision-making. We validated the potential of the agent as an alternative to traditional A/B testing from three perspectives: model, data, and features. Furthermore, we found that the data generated by A/B Agent can effectively enhance the capabilities of recommendation models. Our code is publicly available at https://github.com/Applied-Machine-Learning-Lab/ABAgent.
AB - diningIn recommender systems, online A/B testing is a crucial method for evaluating the performance of different models. However, conducting online A/B testing often presents significant challenges, including substantial economic costs, user experience degradation, and considerable time requirements. With the Large Language Models' powerful capacity, LLM-based agent shows great potential to replace traditional online A/B testing. Nonetheless, current agents fail to simulate the perception process and interaction patterns, due to the lack of real environments and visual perception capability. To address these challenges, we introduce a multi-modal user agent for A/B testing (A/B Agent). Specifically, we construct a recommendation sandbox environment for A/B testing, enabling multimodal and multi-page interactions that align with real user behavior on online platforms. The designed agent leverages multimodal information perception, fine-grained user preferences, and integrates profiles, action memory retrieval, and a fatigue system to simulate complex human decision-making. We validated the potential of the agent as an alternative to traditional A/B testing from three perspectives: model, data, and features. Furthermore, we found that the data generated by A/B Agent can effectively enhance the capabilities of recommendation models. Our code is publicly available at https://github.com/Applied-Machine-Learning-Lab/ABAgent.
KW - a/b testing
KW - multimodal user agent
KW - recommender system
UR - https://www.scopus.com/pages/publications/105038087056
U2 - 10.1145/3770854.3785688
DO - 10.1145/3770854.3785688
M3 - 会议稿件
AN - SCOPUS:105038087056
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 2878
EP - 2889
BT - KDD 2026 - Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.1
PB - Association for Computing Machinery
Y2 - 9 August 2026 through 13 August 2026
ER -