On-Device Neural Net Inference with Mobile GPUs

AI-generated keywords: Mobile GPU On-Device Inference TensorFlow Lite Parallel Processing Real-Time Inferencing

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Machine learning models on mobile phones for on-device inference are increasingly popular due to lower latency and increased privacy.
  • Hardware accelerators are being used to overcome the challenges of running compute-intensive tasks solely on the mobile CPU.
  • Neural processing units in high-end phones account for only a small fraction of hand-held devices.
  • The paper titled "On-Device Neural Net Inference with Mobile GPUs" presents an approach to leverage the mobile GPU, which is found in virtually every phone, to run inference of deep neural networks in real-time for both Android and iOS devices.
  • The authors' state-of-the-art mobile GPU inference engine is integrated into the open-source project TensorFlow Lite and publicly available at https://tensorflow.org/lite.
  • Leveraging the mobile GPU's parallel processing capabilities can achieve faster inference times while maintaining low power consumption.
  • Most phones already have a GPU built-in, so there is no need for additional hardware costs or upgrades.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Juhyun Lee, Nikolay Chirkov, Ekaterina Ignasheva, Yury Pisarchyk, Mogan Shieh, Fabio Riccardi, Raman Sarokin, Andrei Kulik, Matthias Grundmann

Computer Vision and Pattern Recognition Workshop: Efficient Deep Learning for Computer Vision 2019

Abstract: On-device inference of machine learning models for mobile phones is desirable due to its lower latency and increased privacy. Running such a compute-intensive task solely on the mobile CPU, however, can be difficult due to limited computing power, thermal constraints, and energy consumption. App developers and researchers have begun exploiting hardware accelerators to overcome these challenges. Recently, device manufacturers are adding neural processing units into high-end phones for on-device inference, but these account for only a small fraction of hand-held devices. In this paper, we present how we leverage the mobile GPU, a ubiquitous hardware accelerator on virtually every phone, to run inference of deep neural networks in real-time for both Android and iOS devices. By describing our architecture, we also discuss how to design networks that are mobile GPU-friendly. Our state-of-the-art mobile GPU inference engine is integrated into the open-source project TensorFlow Lite and publicly available at https://tensorflow.org/lite.

Submitted to arXiv on 03 Jul. 2019

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 1907.01989v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

The use of machine learning models on mobile phones for on-device inference is becoming increasingly popular due to its lower latency and increased privacy. To overcome the challenges of running such a compute-intensive task solely on the mobile CPU, app developers and researchers have started exploiting hardware accelerators. While device manufacturers are adding neural processing units into high-end phones for on-device inference, they account for only a small fraction of hand-held devices. In this paper titled "On-Device Neural Net Inference with Mobile GPUs," Juhyun Lee and colleagues present their approach to leverage the mobile GPU, which is a ubiquitous hardware accelerator found in virtually every phone, to run inference of deep neural networks in real-time for both Android and iOS devices. The authors describe their architecture and discuss how to design networks that are mobile GPU-friendly. Their state-of-the-art mobile GPU inference engine is integrated into the open-source project TensorFlow Lite and publicly available at https://tensorflow.org/lite. The authors' approach offers several advantages over using the mobile CPU alone or specialized hardware accelerators like neural processing units. By leveraging the mobile GPU's parallel processing capabilities, they can achieve faster inference times while maintaining low power consumption. Additionally, since most phones already have a GPU built in, there is no need for additional hardware costs or upgrades. Overall, this paper presents an innovative solution to overcome the challenges of running machine learning models on mobile devices by leveraging existing hardware resources efficiently. This approach has significant implications for various applications that require real-time inferencing on handheld devices while maintaining privacy and reducing latency.
Created on 06 Apr. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.