TY - GEN
T1 - On-demand Edge Inference Scheduling with Accuracy and Deadline Guarantee
AU - She, Yechao
AU - Li, Minming
AU - Jin, Yang
AU - Xu, Meng
AU - Wang, Jianping
AU - Liu, Bin
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - To meet increasing demands for machine-learning-based applications, pushing inference services to the network edge has been a trend. This work aims to design an on-demand edge inference scheduler with accuracy and deadline guarantee for repetitive tasks. Specifically, we consider an edge server that is preinstalled with multiple early-exit Deep Neural Networks (DNNs), and each DNN-exit pair can provide inference service of different quality. We also consider tasks' diversity in quality of service requirements and related utility. We aim to maximize the system's total utility by optimizing service assignment and time scheduling subject to resource, accuracy, and deadline constraints. We present this problem's integer linear problem formulation and show this problem is NP-hard even for the offline case. This problem is challenging due to the coupled effect of service assignment and time scheduling. To derive low-complexity scheduling solutions, we introduce a task-service graph and convert this problem into a service assignment selection problem with schedulability constraints. Then, we design a polynomial complexity algorithm with $\frac{\rho}{\delta}$-approximation ratio for the offline problem, with $\rho$ referring to the task-wise utility ratio, $\delta$ referring to the maximum number of concurrent tasks. To handle the online problem, we propose an online heuristic algorithm. Simulation results show that the proposed algorithms outperform the state-of-the-art baseline algorithms.
AB - To meet increasing demands for machine-learning-based applications, pushing inference services to the network edge has been a trend. This work aims to design an on-demand edge inference scheduler with accuracy and deadline guarantee for repetitive tasks. Specifically, we consider an edge server that is preinstalled with multiple early-exit Deep Neural Networks (DNNs), and each DNN-exit pair can provide inference service of different quality. We also consider tasks' diversity in quality of service requirements and related utility. We aim to maximize the system's total utility by optimizing service assignment and time scheduling subject to resource, accuracy, and deadline constraints. We present this problem's integer linear problem formulation and show this problem is NP-hard even for the offline case. This problem is challenging due to the coupled effect of service assignment and time scheduling. To derive low-complexity scheduling solutions, we introduce a task-service graph and convert this problem into a service assignment selection problem with schedulability constraints. Then, we design a polynomial complexity algorithm with $\frac{\rho}{\delta}$-approximation ratio for the offline problem, with $\rho$ referring to the task-wise utility ratio, $\delta$ referring to the maximum number of concurrent tasks. To handle the online problem, we propose an online heuristic algorithm. Simulation results show that the proposed algorithms outperform the state-of-the-art baseline algorithms.
UR - https://www.scopus.com/pages/publications/85167789670
U2 - 10.1109/IWQoS57198.2023.10188769
DO - 10.1109/IWQoS57198.2023.10188769
M3 - 会议稿件
AN - SCOPUS:85167789670
T3 - IEEE International Workshop on Quality of Service, IWQoS
BT - 2023 IEEE/ACM 31st International Symposium on Quality of Service, IWQoS 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 31st IEEE/ACM International Symposium on Quality of Service, IWQoS 2023
Y2 - 19 June 2023 through 21 June 2023
ER -