RoScenes: A Large-scale Multi-view 3D Dataset for Roadside Perception
AI-generated Key Points
- RoScenes is the largest multi-view roadside perception dataset designed to advance Bird's Eye View (BEV) approaches for complex traffic scenes.
- It features expansive perception area, comprehensive scene coverage, and dense traffic scenarios.
- Contains 21.13 million 3D annotations within a compact 64,000 $m^2$ area.
- Utilizes a novel BEV-to-3D joint annotation pipeline to efficiently gather data while addressing challenges of costly roadside 3D labeling.
- Current BEV methods evaluated on RoScenes show limitations in handling extensive perception areas and diverse sensor layouts across scenes, leading to subpar performance levels.
- RoBEV method proposed with feature-guided position embedding for effective 2D-3D feature assignment surpasses existing state-of-the-art methods without additional computational overhead on the validation set.
- Detailed statistics and analysis include camera parameters such as occlusion levels, focal length, pitch angle, mounting height, and road coverage.
- Refined BEV annotations implemented to mitigate perspective distortions and jittering effects from UAV imagery.
Authors: Xiaosu Zhu, Hualian Sheng, Sijia Cai, Bing Deng, Shaopeng Yang, Qiao Liang, Ken Chen, Lianli Gao, Jingkuan Song, Jieping Ye
Abstract: We introduce RoScenes, the largest multi-view roadside perception dataset, which aims to shed light on the development of vision-centric Bird's Eye View (BEV) approaches for more challenging traffic scenes. The highlights of RoScenes include significantly large perception area, full scene coverage and crowded traffic. More specifically, our dataset achieves surprising 21.13M 3D annotations within 64,000 $m^2$. To relieve the expensive costs of roadside 3D labeling, we present a novel BEV-to-3D joint annotation pipeline to efficiently collect such a large volume of data. After that, we organize a comprehensive study for current BEV methods on RoScenes in terms of effectiveness and efficiency. Tested methods suffer from the vast perception area and variation of sensor layout across scenes, resulting in performance levels falling below expectations. To this end, we propose RoBEV that incorporates feature-guided position embedding for effective 2D-3D feature assignment. With its help, our method outperforms state-of-the-art by a large margin without extra computational overhead on validation set. Our dataset and devkit will be made available at \url{https://github.com/xiaosu-zhu/RoScenes}.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.