Computation Caching for Efficient Mobile Convolutional Neural Network Inference

Kaydedildi:

Detaylı Bibliyografya
Yayımlandı:	ProQuest Dissertations and Theses (2025)
Yazar:	Mariani, James
Baskı/Yayın Bilgisi:	ProQuest Dissertations & Theses
Konular:	Computer science Computer engineering Artificial intelligence
Online Erişim:	Citation/Abstract Full Text - PDF
Etiketler:	Etiketle Etiket eklenmemiş, İlk siz ekleyin!

Diğer Bilgiler
Özet:	Computer vision on smartphones is commonly achieved through the use of convolutional neural networks (CNNs). CNNs offer accurate image classification, but struggle with latency when run with resource constraints, such as on low-powered mobile devices. Additionally, older CNN models tend to struggle with image classification on smartphones. This dissertation remedies the concerns with real-time image classification on smartphones through an intuitive, systems-based approach that requires no re-training of any model. Our aim of providing faster CNN inference with no required model modifications enables our advancements to be widely used, even by those with little CNN experience.We note that traditional CNN execution during inference produces vast amounts of data from the network’s multitude of internal convolutional layers. These convolutional layers are made up of many ’filters’, each of which is used as a feature extractor for a given input. We identify that the output from these convolutional layers shows certain patterns for a given input. We leverage these observations and design novel caching paradigms to allow for computations to be reused between many runs of the same CNN.First, we introduce a system that takes advantage of the predictability of CNNs, as well as the inherent mobility of smartphones to enable users in the same physical area to share patterns of CNN execution with each other, allowing for computation reuse. Using a smartphone’s inertial characteristics, and device-to-device communication, multiple users in proximity with each other can share valuable information about their environment with each other, allowing devices to make quicker classifications.Second, we explore caching for CNN early-exit strategies. Based on known patterns in CNN execution, we explore if we can confidently predict the class of an image without computing the entire CNN. If an image’s class can be found, then we ’early exit’ the CNN, and return the classification without finishing the inference. We explore novel ways of finding patterns, both online and offline, requiring no deep models. This will allow us to provide latency reduction without much computation overhead. This enables us to offer some of the first CNN early-exit schemes that are specifically designed and optimized for mobile CNN execution.Third, we discuss a class-aware caching scheme that improves on traditional filter-pruning techniques for CNNs. We offer a novel approach to filter pruning, by selectively choosing what filters to compute, and which to pull from a cache based on the predicted class of an image. We develop fast ways of approximating an image’s class, and based on thorough offline profiling decide which filters are most valuable for which classes. At runtime, we only compute a subset of a CNN’s filters, saving significant computation time.Lastly, we explore online caching during CNN inference. Online caching strategies are inherently difficult during CNN execution, as the ground truth class of something run through an CNN is not known. This causes many issues, including improperly labeled data in the cache, which leads to significant accuracy loss. We explore novel strategies to enable online caching, such as cache refresh, and online cache entry labeling with confidence. Our advances make online cache replacement feasible on mobile CNNs, while enabling CNN computation reuse and early-exit for significant latency reduction.This dissertation addresses the challenges that many applications face in mobile image classification, ensuring that classification results can be returned in a timely manner to allow for a good user experience. The research discussed in this dissertation is all focused on solving this same problem, with novel approaches and innovations that can work together to achieve fast and efficient mobile image classification. This dissertation focuses on methods and techniques that are accessible to all developers, requiring no outside infrastructure, no model architecture changes, and no training/retraining of any deep neural networks. In conclusion, these contributions enable image classification to be viable on even the most resource-constrained devices running pretrained CNN models of many varying architectures.
ISBN:	9798290945897
Kaynak:	ProQuest Dissertations & Theses Global