MO-YOLO: End-to-End Multiple-Object Tracking Method with YOLO and Decoder

AI-generated keywords: Multi-object tracking Transformer-based models MO-YOLO Efficient MOT model Resource efficiency

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Recent advancements in Transformer-based end-to-end models in multi-object tracking (MOT) have shown remarkable performance on challenging datasets like DanceTracker.
  • MO-YOLO, an efficient and computationally frugal end-to-end MOT model, was introduced by a team of researchers including Liao Pan, Yang Feng, Wu Di, Liu Bo, and Zhang Xingle.
  • MO-YOLO combines principles from GPT, You Only Look Once (YOLO), and RT-DETR while adopting a decoder-only approach for improved efficiency.
  • By leveraging the decoder architecture from RT-DETR and key components from YOLOv8, MO-YOLO achieves impressive speed and proficient MOT performance.
  • On the Dancetrack dataset, MO-YOLO surpasses existing models like MOTR by achieving over twice the frames per second (MOTR 9.5 FPS vs. MO-YOLO 19.6 FPS).
  • MO-YOLO demonstrates significantly reduced training times and lower hardware requirements compared to its counterparts.
  • This research presents a promising paradigm for efficient end-to-end MOT systems that prioritize enhanced performance while maintaining resource efficiency.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Liao Pan, Yang Feng, Wu Di, Liu Bo, Zhang Xingle

Abstract: In the field of multi-object tracking (MOT), recent Transformer based end-to-end models like MOTR have demonstrated exceptional performance on datasets such as DanceTracker. However, the computational demands of these models present challenges in training and deployment. Drawing inspiration from successful models like GPT, we present MO-YOLO, an efficient and computationally frugal end-to-end MOT model. MO-YOLO integrates principles from You Only Look Once (YOLO) and RT-DETR, adopting a decoder-only approach. By leveraging the decoder from RT-DETR and architectural components from YOLOv8, MO-YOLO achieves high speed, shorter training times, and proficient MOT performance. On the Dancetrack, MO-YOLO not only matches MOTR's performance but also surpasses it, achieving over twice the frames per second (MOTR 9.5 FPS, MO-YOLO 19.6 FPS). Furthermore, MO-YOLO demonstrates significantly reduced training times and lower hardware requirements compared to MOTR. This research introduces a promising paradigm for efficient end-to-end MOT, emphasizing enhanced performance and resource efficiency.

Submitted to arXiv on 26 Oct. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2310.17170v3

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In the field of multi-object tracking (MOT), recent advancements in Transformer-based end-to-end models have shown remarkable performance on challenging datasets such as DanceTracker. However, the high computational demands of these models pose significant challenges in terms of training and deployment. To address this issue, a team of researchers including Liao Pan, Yang Feng, Wu Di, Liu Bo, and Zhang Xingle have introduced MO-YOLO, an efficient and computationally frugal end-to-end MOT model. Drawing inspiration from successful models like GPT, MO-YOLO combines principles from You Only Look Once (YOLO) and RT-DETR while adopting a decoder-only approach. By leveraging the decoder architecture from RT-DETR and incorporating key components from YOLOv8, MO-YOLO achieves impressive speed and proficient MOT performance. In fact, on the Dancetrack dataset, MO-YOLO not only matches but surpasses the performance of existing models like MOTR by achieving over twice the frames per second (MOTR 9.5 FPS vs. MO-YOLO 19.6 FPS). Additionally, MO-YOLO demonstrates significantly reduced training times and lower hardware requirements compared to its counterparts. This research presents a promising paradigm for efficient end-to-end MOT systems that prioritize enhanced performance while maintaining resource efficiency. The innovative approach taken by the authors showcases the potential for future developments in this field to overcome computational limitations and improve overall tracking capabilities in various applications.
Created on 09 Apr. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.