Pelee: A Real-Time Object Detection System on Mobile Devices

AI-generated keywords: Deep Learning Convolutional Neural Network (CNN) Efficient Architectures PeleeNet Real-time Object Detection

AI-generated Key Points

⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

Growing demand for efficient Convolutional Neural Network (CNN) models for mobile devices with limited computational power and memory resources
Emergence of efficient architectures like MobileNet, ShuffleNet, and MobileNetV2 relying on depthwise separable convolution
Introduction of PeleeNet architecture by researchers led by Robert J. Wang, Xiang Li, and Charles X. Ling using conventional convolution instead of depthwise separable convolution
PeleeNet achieved higher accuracy and ran over 1.8 times faster than MobileNet and MobileNetV2 on NVIDIA TX2 hardware while being only 66% of the size of MobileNet
Development of real-time object detection system named Pelee combining PeleeNet with SSD method optimized for speed
Impressive results: Pelee achieved mean average precision (mAP) of 76.4% on PASCAL VOC2007 dataset and 22.4 mAP on MS COCO dataset at speeds of 23.6 FPS on iPhone 8 and 125 FPS on NVIDIA TX2 hardware
Outperformed YOLOv2 in precision with a computational cost that was 13.6 times lower and a model size that was 11.3 times smaller
Demonstrates potential impact of efficient model design in enabling real-time object detection systems to operate seamlessly on mobile devices without compromising performance or accuracy levels

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Robert J. Wang, Xiang Li, Charles X. Ling

arXiv: 1804.06882v3 - DOI (cs.CV)

Accepted to NeurIPS 2018

License: NONEXCLUSIVE-DISTRIB 1.0

Abstract: An increasing need of running Convolutional Neural Network (CNN) models on mobile devices with limited computing power and memory resource encourages studies on efficient model design. A number of efficient architectures have been proposed in recent years, for example, MobileNet, ShuffleNet, and MobileNetV2. However, all these models are heavily dependent on depthwise separable convolution which lacks efficient implementation in most deep learning frameworks. In this study, we propose an efficient architecture named PeleeNet, which is built with conventional convolution instead. On ImageNet ILSVRC 2012 dataset, our proposed PeleeNet achieves a higher accuracy and over 1.8 times faster speed than MobileNet and MobileNetV2 on NVIDIA TX2. Meanwhile, PeleeNet is only 66% of the model size of MobileNet. We then propose a real-time object detection system by combining PeleeNet with Single Shot MultiBox Detector (SSD) method and optimizing the architecture for fast speed. Our proposed detection system2, named Pelee, achieves 76.4% mAP (mean average precision) on PASCAL VOC2007 and 22.4 mAP on MS COCO dataset at the speed of 23.6 FPS on iPhone 8 and 125 FPS on NVIDIA TX2. The result on COCO outperforms YOLOv2 in consideration of a higher precision, 13.6 times lower computational cost and 11.3 times smaller model size.

Submitted to arXiv on 18 Apr. 2018

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 1804.06882v3

⚠This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

Comprehensive Summary
Key points
Layman's Summary
Blog article

In the realm of deep learning, there is a growing demand for efficient Convolutional Neural Network (CNN) models that can run on mobile devices with limited computational power and memory resources. This necessity has spurred research into designing models that are both accurate and fast. Recent years have seen the emergence of several efficient architectures such as MobileNet, ShuffleNet, and MobileNetV2, all of which heavily rely on depthwise separable convolution. However, this particular type of convolution lacks efficient implementation in most deep learning frameworks. To address this challenge, a team of researchers led by Robert J. Wang, Xiang Li, and Charles X. Ling proposed an innovative architecture called PeleeNet. Unlike its counterparts, PeleeNet is built using conventional convolution instead of depthwise separable convolution. The researchers conducted experiments on the ImageNet ILSVRC 2012 dataset and found that PeleeNet not only achieved higher accuracy but also ran over 1.8 times faster than MobileNet and MobileNetV2 on NVIDIA TX2 hardware. Additionally, PeleeNet boasted a model size that was only 66% of MobileNet's size. Building upon the success of PeleeNet, the researchers went on to develop a real-time object detection system named Pelee by combining PeleeNet with the Single Shot MultiBox Detector (SSD) method and optimizing the architecture for speed. The results were impressive - Pelee achieved a mean average precision (mAP) of 76.4% on the PASCAL VOC2007 dataset and 22.4 mAP on the MS COCO dataset while running at speeds of 23.6 frames per second (FPS) on an iPhone 8 and 125 FPS on NVIDIA TX2 hardware. Notably, Pelee outperformed YOLOv2 in terms of precision while offering a computational cost that was 13.6 times lower and a model size that was 11.3 times smaller. These findings demonstrate the potential impact of efficient model design in enabling real-time object detection systems to operate seamlessly on mobile devices without compromising performance or accuracy levels. This groundbreaking research by Wang et al., accepted at NeurIPS 2018, showcases how innovative architectural choices can lead to significant advancements in deep learning applications for mobile platforms.

- Growing demand for efficient Convolutional Neural Network (CNN) models for mobile devices with limited computational power and memory resources
- Emergence of efficient architectures like MobileNet, ShuffleNet, and MobileNetV2 relying on depthwise separable convolution
- Introduction of PeleeNet architecture by researchers led by Robert J. Wang, Xiang Li, and Charles X. Ling using conventional convolution instead of depthwise separable convolution
- PeleeNet achieved higher accuracy and ran over 1.8 times faster than MobileNet and MobileNetV2 on NVIDIA TX2 hardware while being only 66% of the size of MobileNet
- Development of real-time object detection system named Pelee combining PeleeNet with SSD method optimized for speed
- Impressive results: Pelee achieved mean average precision (mAP) of 76.4% on PASCAL VOC2007 dataset and 22.4 mAP on MS COCO dataset at speeds of 23.6 FPS on iPhone 8 and 125 FPS on NVIDIA TX2 hardware
- Outperformed YOLOv2 in precision with a computational cost that was 13.6 times lower and a model size that was 11.3 times smaller
- Demonstrates potential impact of efficient model design in enabling real-time object detection systems to operate seamlessly on mobile devices without compromising performance or accuracy levels

Summary- People want better computer programs that can recognize things quickly on phones and tablets. - Some smart people made new ways to make these programs work faster on small devices. - One group of researchers created a special program called PeleeNet that works really well and is smaller than other similar programs. - The Pelee program can find things accurately and quickly, even on phones and tablets. - This new program is better than some other ones in terms of accuracy, speed, and size. Definitions- Convolutional Neural Network (CNN): A type of computer program that helps machines recognize patterns in images or data. - Architecture: The design or structure of something, like how a building or computer program is put together. - Depthwise separable convolution: A method used in designing efficient neural network models by breaking down the process into separate parts for better performance.

Introduction

In recent years, there has been a growing demand for efficient Convolutional Neural Network (CNN) models that can run on mobile devices with limited computational power and memory resources. This necessity has spurred research into designing models that are both accurate and fast. One of the key challenges in this area is finding an efficient implementation of depthwise separable convolution, which is heavily relied upon by popular architectures such as MobileNet, ShuffleNet, and MobileNetV2. To address this challenge, a team of researchers led by Robert J. Wang, Xiang Li, and Charles X. Ling proposed an innovative architecture called PeleeNet. Unlike its counterparts, PeleeNet is built using conventional convolution instead of depthwise separable convolution. The researchers conducted experiments on the ImageNet ILSVRC 2012 dataset and found that PeleeNet not only achieved higher accuracy but also ran over 1.8 times faster than MobileNet and MobileNetV2 on NVIDIA TX2 hardware.

The Need for Efficient CNN Models on Mobile Devices

With the widespread use of smartphones and other mobile devices in our daily lives, there is a growing demand for deep learning applications to be able to run efficiently on these devices without compromising performance or accuracy levels. However, most mobile devices have limited computational power and memory resources compared to traditional desktop computers or servers. This limitation poses a significant challenge for developers who want to deploy deep learning models on mobile platforms. To overcome this challenge, researchers have been exploring ways to design efficient CNN architectures that can meet the demands of real-time applications while running smoothly on mobile devices.

The Emergence of Efficient Architectures

Recent years have seen the emergence of several efficient architectures such as MobileNet, ShuffleNet, and MobileNetV2. These architectures rely heavily on depthwise separable convolution - a technique that decomposes a standard convolution into two separate operations: depthwise convolution and pointwise convolution. This approach reduces the number of parameters and computational cost, making it ideal for mobile platforms. However, despite its benefits, depthwise separable convolution lacks efficient implementation in most deep learning frameworks. This limitation has led researchers to explore alternative approaches to designing efficient architectures for mobile devices.

PeleeNet: A Novel Architecture

To address the challenge of implementing depthwise separable convolution efficiently, Wang et al. proposed an innovative architecture called PeleeNet. Unlike its counterparts, PeleeNet is built using conventional convolution instead of depthwise separable convolution. The key idea behind PeleeNet is to use group convolutions - a technique that divides the input channels into groups and applies separate filters to each group - to reduce the computational cost while maintaining accuracy levels. Additionally, PeleeNet uses bottleneck structures with 1x1 convolutions to further reduce the model size without compromising performance.

Experimental Results

The researchers conducted experiments on the ImageNet ILSVRC 2012 dataset and compared PeleeNet's performance with other popular architectures such as MobileNet and MobileNetV2. The results were impressive - not only did PeleeNet achieve higher accuracy than its counterparts but it also ran over 1.8 times faster on NVIDIA TX2 hardware. Furthermore, PeleeNet boasted a model size that was only 66% of MobileNet's size, making it more suitable for deployment on mobile devices with limited memory resources.

Real-Time Object Detection with Pelee

Building upon the success of PeleeNet, Wang et al. went on to develop a real-time object detection system named "Pelee" by combining PeleeNet with the Single Shot MultiBox Detector (SSD) method and optimizing the architecture for speed. The results were even more impressive - Pelee achieved a mean average precision (mAP) of 76.4% on the PASCAL VOC2007 dataset and 22.4 mAP on the MS COCO dataset while running at speeds of 23.6 frames per second (FPS) on an iPhone 8 and 125 FPS on NVIDIA TX2 hardware. Notably, Pelee outperformed YOLOv2 in terms of precision while offering a computational cost that was 13.6 times lower and a model size that was 11.3 times smaller.

Conclusion

The research by Wang et al., accepted at NeurIPS 2018, showcases how innovative architectural choices can lead to significant advancements in deep learning applications for mobile platforms. The development of PeleeNet has demonstrated the potential impact of efficient model design in enabling real-time object detection systems to operate seamlessly on mobile devices without compromising performance or accuracy levels. This groundbreaking research not only offers a novel solution to implementing depthwise separable convolution efficiently but also provides a promising direction for future developments in efficient CNN architectures for mobile devices. With the increasing demand for deep learning applications on mobile platforms, this research opens up new possibilities for deploying advanced models with high accuracy and speed capabilities on resource-constrained devices.

Created on 29 Mar. 2024

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.