DAMO-YOLO : A Report on Real-Time Object Detection Design

AI-generated keywords: Object Detection

AI-generated Key Points

⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

Authors introduce novel object detection method called DAMO-YOLO
Incorporates cutting-edge technologies such as Neural Architecture Search (NAS), Reparameterized Generalized-FPN, lightweight head with AlignedOTA label assignment, and distillation enhancement
Optimization of detection backbone using MAE-NAS guided by principle of maximum entropy
Structures resembling ResNet/CSP with spatial pyramid pooling and focus modules
Integration of Generalized-FPN with accelerated queen-fusion for detector neck, enhanced CSPNet with efficient layer aggregation networks (ELAN) and reparameterization
Study on detector head size impact on accuracy, favoring heavy neck with single task projection layer
Introduction of AlignedOTA to address misalignment issues in label assignment, distillation schema for performance enhancement
Development of range of models tailored to different scenarios: DAMO-YOLO-T/S/M/L for general industry requirements achieving mAPs of 43.6/47.7/50.2/51.9 on COCO dataset with latencies ranging from 2.78 to 7.95 ms on T4 GPUs; DAMO-YOLO-Ns/Nm/Nl lightweight models for edge devices achieving mAPs of 32.3/38.2/40.5 on COCO with latencies between 4.08 and 6.69 ms on X86-CPU
Outperforms existing YOLO series models in various application scenarios due to innovative technologies and scalable model designs tailored to specific needs

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Xianzhe Xu, Yiqi Jiang, Weihua Chen, Yilun Huang, Yuan Zhang, Xiuyu Sun

arXiv: 2211.15444v4 - DOI (cs.CV)

Project Website: https://github.com/tinyvision/damo-yolo

License: NONEXCLUSIVE-DISTRIB 1.0

Abstract: In this report, we present a fast and accurate object detection method dubbed DAMO-YOLO, which achieves higher performance than the state-of-the-art YOLO series. DAMO-YOLO is extended from YOLO with some new technologies, including Neural Architecture Search (NAS), efficient Reparameterized Generalized-FPN (RepGFPN), a lightweight head with AlignedOTA label assignment, and distillation enhancement. In particular, we use MAE-NAS, a method guided by the principle of maximum entropy, to search our detection backbone under the constraints of low latency and high performance, producing ResNet/CSP-like structures with spatial pyramid pooling and focus modules. In the design of necks and heads, we follow the rule of ``large neck, small head''.We import Generalized-FPN with accelerated queen-fusion to build the detector neck and upgrade its CSPNet with efficient layer aggregation networks (ELAN) and reparameterization. Then we investigate how detector head size affects detection performance and find that a heavy neck with only one task projection layer would yield better results.In addition, AlignedOTA is proposed to solve the misalignment problem in label assignment. And a distillation schema is introduced to improve performance to a higher level. Based on these new techs, we build a suite of models at various scales to meet the needs of different scenarios. For general industry requirements, we propose DAMO-YOLO-T/S/M/L. They can achieve 43.6/47.7/50.2/51.9 mAPs on COCO with the latency of 2.78/3.83/5.62/7.95 ms on T4 GPUs respectively. Additionally, for edge devices with limited computing power, we have also proposed DAMO-YOLO-Ns/Nm/Nl lightweight models. They can achieve 32.3/38.2/40.5 mAPs on COCO with the latency of 4.08/5.05/6.69 ms on X86-CPU. Our proposed general and lightweight models have outperformed other YOLO series models in their respective application scenarios.

Submitted to arXiv on 23 Nov. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2211.15444v4

⚠This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

Comprehensive Summary
Key points
Layman's Summary
Blog article

In their report titled "DAMO-YOLO: A Report on Real-Time Object Detection Design," authors Xianzhe Xu, Yiqi Jiang, Weihua Chen, Yilun Huang, Yuan Zhang, and Xiuyu Sun introduce a novel object detection method called DAMO-YOLO. This method surpasses the performance of the well-known YOLO series by incorporating cutting-edge technologies such as Neural Architecture Search (NAS), efficient Reparameterized Generalized-FPN (RepGFPN), a lightweight head with AlignedOTA label assignment, and distillation enhancement. The authors employ MAE-NAS guided by the principle of maximum entropy to optimize the detection backbone for low latency and high performance. This results in structures resembling ResNet/CSP with spatial pyramid pooling and focus modules. Following the design philosophy of "large neck, small head," they integrate Generalized-FPN with accelerated queen-fusion for the detector neck and enhance CSPNet with efficient layer aggregation networks (ELAN) and reparameterization. Furthermore, the study explores how detector head size impacts detection accuracy, concluding that a heavy neck with a single task projection layer yields superior results. The introduction of AlignedOTA addresses misalignment issues in label assignment, while a distillation schema enhances overall performance. Based on these advancements, the authors develop a range of models tailored to different scenarios. For general industry requirements, they propose DAMO-YOLO-T/S/M/L models achieving mAPs of 43.6/47.7/50.2/51.9 on COCO dataset with latencies ranging from 2.78 to 7.95 ms on T4 GPUs. Additionally, for edge devices with limited computing power, they introduce DAMO-YOLO-Ns/Nm/Nl lightweight models achieving mAPs of 32.3/38.2/40.5 on COCO with latencies between 4.08 and 6.69 ms on X86-CPU. Overall, the proposed DAMO-YOLO method outperforms existing YOLO series models in various application scenarios due to its innovative technologies and scalable model designs tailored to specific needs.

- Authors introduce novel object detection method called DAMO-YOLO
- Incorporates cutting-edge technologies such as Neural Architecture Search (NAS), Reparameterized Generalized-FPN, lightweight head with AlignedOTA label assignment, and distillation enhancement
- Optimization of detection backbone using MAE-NAS guided by principle of maximum entropy
- Structures resembling ResNet/CSP with spatial pyramid pooling and focus modules
- Integration of Generalized-FPN with accelerated queen-fusion for detector neck, enhanced CSPNet with efficient layer aggregation networks (ELAN) and reparameterization
- Study on detector head size impact on accuracy, favoring heavy neck with single task projection layer
- Introduction of AlignedOTA to address misalignment issues in label assignment, distillation schema for performance enhancement
- Development of range of models tailored to different scenarios: DAMO-YOLO-T/S/M/L for general industry requirements achieving mAPs of 43.6/47.7/50.2/51.9 on COCO dataset with latencies ranging from 2.78 to 7.95 ms on T4 GPUs; DAMO-YOLO-Ns/Nm/Nl lightweight models for edge devices achieving mAPs of 32.3/38.2/40.5 on COCO with latencies between 4.08 and 6.69 ms on X86-CPU
- Outperforms existing YOLO series models in various application scenarios due to innovative technologies and scalable model designs tailored to specific needs

Summary- Authors created a new way to find objects called DAMO-YOLO. - They used advanced technologies like Neural Architecture Search (NAS) and Generalized-FPN. - They improved the detection backbone using MAE-NAS for better results. - The structures are similar to ResNet/CSP with special modules. - Different models were made for different needs, with DAMO-YOLO outperforming other models. Definitions- Object Detection: Finding and identifying objects in images or videos. - Neural Architecture Search (NAS): Using algorithms to automatically design neural network architectures. - Generalized-FPN: Feature Pyramid Network that helps in object detection tasks by enhancing feature maps at different scales. - ResNet/CSP: Types of neural network architectures commonly used in computer vision tasks. - Model Optimization: Improving the performance of a model by adjusting its parameters or structure.

Introduction Object detection is a fundamental task in computer vision that involves identifying and localizing objects of interest within an image. It has numerous applications, including autonomous driving, surveillance, and robotics. Over the years, various object detection methods have been developed to improve accuracy and efficiency. One such method is YOLO (You Only Look Once), which has gained popularity due to its real-time performance on high-resolution images. However, the YOLO series still faces challenges in terms of accuracy and speed. To address these limitations, a team of researchers from DAMO Academy at Alibaba Group has introduced a new object detection method called DAMO-YOLO. In their research paper titled "DAMO-YOLO: A Report on Real-Time Object Detection Design," authors Xianzhe Xu, Yiqi Jiang, Weihua Chen, Yilun Huang, Yuan Zhang, and Xiuyu Sun present their findings on this novel approach to object detection. Overview of DAMO-YOLO Method The DAMO-YOLO method incorporates cutting-edge technologies such as Neural Architecture Search (NAS), efficient Reparameterized Generalized-FPN (RepGFPN), AlignedOTA label assignment for misalignment issues in labeling data, and distillation enhancement for improved performance. One of the key components of this method is NAS guided by MAE-NAS (Maximum Entropy-based Neural Architecture Search). This approach optimizes the detection backbone for low latency while maintaining high performance. The resulting structures resemble ResNet/CSP with spatial pyramid pooling and focus modules. Additionally, the authors follow a design philosophy of "large neck, small head" by integrating Generalized-FPN with accelerated queen-fusion for the detector neck. They also enhance CSPNet with efficient layer aggregation networks (ELAN) and reparameterization techniques to further improve performance. Impact of Detector Head Size In their study, the authors explore how the size of the detector head affects detection accuracy. They conclude that a heavy neck with a single task projection layer yields superior results compared to other configurations. AlignedOTA Label Assignment One of the major challenges in object detection is misalignment issues in labeling data, which can significantly impact accuracy. To address this, the authors introduce AlignedOTA label assignment, which aligns bounding boxes with grid cells for more accurate localization. Distillation Enhancement The DAMO-YOLO method also incorporates distillation enhancement, where knowledge from larger models is transferred to smaller ones through teacher-student learning. This improves overall performance and allows for more efficient lightweight models. Scalable Model Designs Based on their advancements, the authors develop a range of DAMO-YOLO models tailored to different application scenarios. For general industry requirements, they propose DAMO-YOLO-T/S/M/L models achieving mAPs (mean Average Precision) of 43.6/47.7/50.2/51.9 on COCO dataset with latencies ranging from 2.78 to 7.95 ms on T4 GPUs. For edge devices with limited computing power, they introduce DAMO-YOLO-Ns/Nm/Nl lightweight models achieving mAPs of 32.3/38.2/40.5 on COCO dataset with latencies between 4.08 and 6.69 ms on X86-CPU. Conclusion In conclusion, the research paper "DAMO-YOLO: A Report on Real-Time Object Detection Design" presents an innovative approach to object detection that surpasses existing YOLO series methods in terms of accuracy and speed by incorporating cutting-edge technologies such as NAS and RepGFPN along with scalable model designs tailored to specific needs. The introduction of AlignedOTA label assignment addresses misalignment issues in labeling data while distillation enhancement further improves performance by transferring knowledge from larger models to smaller ones. The DAMO-YOLO method also explores the impact of detector head size on accuracy and proposes a design philosophy of "large neck, small head" for optimal results. Overall, the DAMO-YOLO method shows promising results in various application scenarios and has the potential to advance the field of object detection. Further research and development in this area could lead to even more efficient and accurate methods for real-time object detection.

Created on 21 Feb. 2026

Assess the quality of the AI-generated content by voting

Score: 0

Similar papers summarized with our AI tools

73.3%

Fast YOLO: A Fast You Only Look Once System for Real-time Embedded Object Det…

cs.CV

72.5%

YOLOv4: Optimal Speed and Accuracy of Object Detection

cs.CV

72.0%

YOLOv3: An Incremental Improvement

cs.CV

71.7%

YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time obj…

cs.CV

71.3%

MO-YOLO: End-to-End Multiple-Object Tracking Method with YOLO and Decoder

cs.CV

71.0%

YOLO Nano: a Highly Compact You Only Look Once Convolutional Neural Network f…

cs.CV

70.3%

You Only Look Once: Unified, Real-Time Object Detection

cs.CV

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.