TY - GEN
T1 - Multi-task Offline Reinforcement Learning for Online Advertising in Recommender Systems
AU - Liu, Langming
AU - Wang, Wanyu
AU - Zhang, Chi
AU - Li, Bo
AU - Yin, Hongzhi
AU - Wei, Xuetao
AU - Su, Wenbo
AU - Zheng, Bo
AU - Zhao, Xiangyu
N1 - Publisher Copyright:
© 2025 ACM.
PY - 2025/8/3
Y1 - 2025/8/3
N2 - Online advertising in recommendation platforms has gained significant attention, with a predominant focus on channel recommendation and budget allocation strategies. However, current offline reinforcement learning (RL) methods face substantial challenges when applied to sparse advertising scenarios, primarily due to severe overestimation, distributional shifts, and overlooking budget constraints. To address these issues, we propose MTORL, a novel multi-task offline RL model that targets two key objectives. First, we establish a Markov Decision Process (MDP) framework specific to the nuances of advertising. Then, we develop a causal state encoder to capture dynamic user interests and temporal dependencies, facilitating offline RL through conditional sequence modeling. Causal attention mechanisms are introduced to enhance user sequence representations by identifying correlations among causal states. We employ multi-task learning to decode actions and rewards, simultaneously addressing channel recommendation and budget allocation. Notably, our framework includes an automated system for integrating these tasks into online advertising. Extensive experiments on offline and online environments demonstrate MTORL's superiority over state-of-the-art methods. The code is available online at https://github.com/Applied-Machine-Learning-Lab/MTORL.
AB - Online advertising in recommendation platforms has gained significant attention, with a predominant focus on channel recommendation and budget allocation strategies. However, current offline reinforcement learning (RL) methods face substantial challenges when applied to sparse advertising scenarios, primarily due to severe overestimation, distributional shifts, and overlooking budget constraints. To address these issues, we propose MTORL, a novel multi-task offline RL model that targets two key objectives. First, we establish a Markov Decision Process (MDP) framework specific to the nuances of advertising. Then, we develop a causal state encoder to capture dynamic user interests and temporal dependencies, facilitating offline RL through conditional sequence modeling. Causal attention mechanisms are introduced to enhance user sequence representations by identifying correlations among causal states. We employ multi-task learning to decode actions and rewards, simultaneously addressing channel recommendation and budget allocation. Notably, our framework includes an automated system for integrating these tasks into online advertising. Extensive experiments on offline and online environments demonstrate MTORL's superiority over state-of-the-art methods. The code is available online at https://github.com/Applied-Machine-Learning-Lab/MTORL.
KW - advertising
KW - multi-task learning
KW - offline reinforcement learning
UR - https://www.scopus.com/pages/publications/105014320423
U2 - 10.1145/3711896.3737250
DO - 10.1145/3711896.3737250
M3 - 会议稿件
AN - SCOPUS:105014320423
T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
SP - 4635
EP - 4646
BT - KDD 2025 - Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining
PB - Association for Computing Machinery
T2 - 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD 2025
Y2 - 3 August 2025 through 7 August 2025
ER -