NUDIF: A Non-Uniform Deployment Framework for Distributed Inference in Heterogeneous Edge Clusters

Guardado en:
Detalles Bibliográficos
Publicado en:Future Internet vol. 17, no. 4 (2025), p. 168
Autor principal: Li, Peng
Otros Autores: Chen, Qing, Liu, Hao
Publicado:
MDPI AG
Materias:
Acceso en línea:Citation/Abstract
Full Text + Graphics
Full Text - PDF
Etiquetas: Agregar Etiqueta
Sin Etiquetas, Sea el primero en etiquetar este registro!

MARC

LEADER 00000nab a2200000uu 4500
001 3194606736
003 UK-CbPIL
022 |a 1999-5903 
024 7 |a 10.3390/fi17040168  |2 doi 
035 |a 3194606736 
045 2 |b d20250401  |b d20250430 
084 |a 231464  |2 nlm 
100 1 |a Li, Peng  |u National Key Laboratory of Complex Aviation System Simulation, Chengdu 610036, China; qingchen_1@cetc.com.cn 
245 1 |a NUDIF: A Non-Uniform Deployment Framework for Distributed Inference in Heterogeneous Edge Clusters 
260 |b MDPI AG  |c 2025 
513 |a Journal Article 
520 3 |a Distributed inference in resource-constrained heterogeneous edge clusters is fundamentally limited by disparities in device capabilities and load imbalance issues. Existing methods predominantly focus on optimizing single-pipeline allocation schemes for partitioned sub-models. However, such approaches often lead to load imbalance and suboptimal resource utilization under concurrent batch processing scenarios. To address these challenges, we propose a non-uniform deployment inference framework (NUDIF), which achieves high-throughput distributed inference service by adapting to heterogeneous resources and balancing inter-stage processing capabilities. Formulated as a mixed-integer nonlinear programming (MINLP) problem, NUDIF is responsible for planning the number of instances for each sub-model and determining the specific devices for deploying these instances, while considering computational capacity, memory constraints, and communication latency. This optimization minimizes inter-stage processing discrepancies and maximizes resource utilization. Experimental evaluations demonstrate that NUDIF enhances system throughput by an average of 9.95% compared to traditional single-pipeline optimization methods under various scales of cluster device configurations. 
653 |a Collaboration 
653 |a Dynamic programming 
653 |a Edge computing 
653 |a Communication 
653 |a Bandwidths 
653 |a Optimization 
653 |a Neural networks 
653 |a Inference 
653 |a Adaptation 
653 |a Unmanned aerial vehicles 
653 |a Linear programming 
653 |a Batch processing 
653 |a Algorithms 
653 |a Mixed integer 
653 |a Clusters 
653 |a Resource utilization 
653 |a Energy consumption 
653 |a Large language models 
653 |a Nonlinear programming 
653 |a Load balancing 
700 1 |a Chen, Qing  |u National Key Laboratory of Complex Aviation System Simulation, Chengdu 610036, China; qingchen_1@cetc.com.cn 
700 1 |a Liu, Hao  |u School of Computer Science (National Pilot Software Engineering School), Beijing University of Posts and Telecommunications (BUPT), Beijing 100876, China; liuhao@bupt.edu.cn 
773 0 |t Future Internet  |g vol. 17, no. 4 (2025), p. 168 
786 0 |d ProQuest  |t ABI/INFORM Global 
856 4 1 |3 Citation/Abstract  |u https://www.proquest.com/docview/3194606736/abstract/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text + Graphics  |u https://www.proquest.com/docview/3194606736/fulltextwithgraphics/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch 
856 4 0 |3 Full Text - PDF  |u https://www.proquest.com/docview/3194606736/fulltextPDF/embedded/7BTGNMKEMPT1V9Z2?source=fedsrch