Anatomizing Deep Learning Inference in Web Browsers

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org (Jul 25, 2024), p. n/a
1. Verfasser:	Wang, Qipeng
Weitere Verfasser:	Jiang, Shiqi, Chen, Zhenpeng, Cao, Xu, Li, Yuanchun, Li, Aoyu, Ma, Yun, Cao, Ting, Liu, Xuanzhe
Veröffentlicht:	Cornell University Library, arXiv.org
Schlagworte:	Deep learning Performance evaluation Smoothness User experience Applications programs Graphics processing units Inference Web browsers Graphical user interface Memory management Run time (computers)
Online-Zugang:	Citation/Abstract Full text outside of ProQuest
Tags:	Tag hinzufügen Keine Tags, Fügen Sie das erste Tag hinzu!

MARC


LEADER	00000nab a2200000uu 4500
001	2925286048
003	UK-CbPIL
022			\|a 2331-8422
035			\|a 2925286048
045	0		\|b d20240725
100	1		\|a Wang, Qipeng
245	1		\|a Anatomizing Deep Learning Inference in Web Browsers
260			\|b Cornell University Library, arXiv.org \|c Jul 25, 2024
513			\|a Working Paper
520	3		\|a Web applications have increasingly adopted Deep Learning (DL) through in-browser inference, wherein DL inference performs directly within Web browsers. The actual performance of in-browser inference and its impacts on the quality of experience (QoE) remain unexplored, and urgently require new QoE measurements beyond traditional ones, e.g., mainly focusing on page load time. To bridge this gap, we make the first comprehensive performance measurement of in-browser inference to date. Our approach proposes new metrics to measure in-browser inference: responsiveness, smoothness, and inference accuracy. Our extensive analysis involves 9 representative DL models across Web browsers of 50 popular PC devices and 20 mobile devices. The results reveal that in-browser inference exhibits a substantial latency gap, averaging 16.9 times slower on CPU and 4.9 times slower on GPU compared to native inference on PC devices. The gap on mobile CPU and mobile GPU is 15.8 times and 7.8 times, respectively. Furthermore, we identify contributing factors to such latency gap, including underutilized hardware instruction sets, inherent overhead in the runtime environment, resource contention within the browser, and inefficiencies in software libraries and GPU abstractions. Additionally, in-browser inference imposes significant memory demands, at times exceeding 334.6 times the size of the DL models themselves, partly attributable to suboptimal memory management. We also observe that in-browser inference leads to a significant 67.2% increase in the time it takes for GUI components to render within Web browsers, significantly affecting the overall user QoE of Web applications reliant on this technology
653			\|a Deep learning
653			\|a Performance evaluation
653			\|a Smoothness
653			\|a User experience
653			\|a Applications programs
653			\|a Graphics processing units
653			\|a Inference
653			\|a Web browsers
653			\|a Graphical user interface
653			\|a Memory management
653			\|a Run time (computers)
700	1		\|a Jiang, Shiqi
700	1		\|a Chen, Zhenpeng
700	1		\|a Cao, Xu
700	1		\|a Li, Yuanchun
700	1		\|a Li, Aoyu
700	1		\|a Ma, Yun
700	1		\|a Cao, Ting
700	1		\|a Liu, Xuanzhe
773	0		\|t arXiv.org \|g (Jul 25, 2024), p. n/a
786	0		\|d ProQuest \|t Engineering Database
856	4	1	\|3 Citation/Abstract \|u https://www.proquest.com/docview/2925286048/abstract/embedded/H09TXR3UUZB2ISDL?source=fedsrch
856	4	0	\|3 Full text outside of ProQuest \|u http://arxiv.org/abs/2402.05981