MO-YOLO: End-to-End Multiple-Object Tracking Method with YOLO and Decoder
AI-generated Key Points
⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.
- Recent advancements in Transformer-based end-to-end models in multi-object tracking (MOT) have shown remarkable performance on challenging datasets like DanceTracker.
- MO-YOLO, an efficient and computationally frugal end-to-end MOT model, was introduced by a team of researchers including Liao Pan, Yang Feng, Wu Di, Liu Bo, and Zhang Xingle.
- MO-YOLO combines principles from GPT, You Only Look Once (YOLO), and RT-DETR while adopting a decoder-only approach for improved efficiency.
- By leveraging the decoder architecture from RT-DETR and key components from YOLOv8, MO-YOLO achieves impressive speed and proficient MOT performance.
- On the Dancetrack dataset, MO-YOLO surpasses existing models like MOTR by achieving over twice the frames per second (MOTR 9.5 FPS vs. MO-YOLO 19.6 FPS).
- MO-YOLO demonstrates significantly reduced training times and lower hardware requirements compared to its counterparts.
- This research presents a promising paradigm for efficient end-to-end MOT systems that prioritize enhanced performance while maintaining resource efficiency.
Authors: Liao Pan, Yang Feng, Wu Di, Liu Bo, Zhang Xingle
Abstract: In the field of multi-object tracking (MOT), recent Transformer based end-to-end models like MOTR have demonstrated exceptional performance on datasets such as DanceTracker. However, the computational demands of these models present challenges in training and deployment. Drawing inspiration from successful models like GPT, we present MO-YOLO, an efficient and computationally frugal end-to-end MOT model. MO-YOLO integrates principles from You Only Look Once (YOLO) and RT-DETR, adopting a decoder-only approach. By leveraging the decoder from RT-DETR and architectural components from YOLOv8, MO-YOLO achieves high speed, shorter training times, and proficient MOT performance. On the Dancetrack, MO-YOLO not only matches MOTR's performance but also surpasses it, achieving over twice the frames per second (MOTR 9.5 FPS, MO-YOLO 19.6 FPS). Furthermore, MO-YOLO demonstrates significantly reduced training times and lower hardware requirements compared to MOTR. This research introduces a promising paradigm for efficient end-to-end MOT, emphasizing enhanced performance and resource efficiency.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.
Look for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.