A Stem-Agnostic Single-Decoder System for Music Source Separation Beyond Four Stems

AI-generated keywords: Audio source separation

AI-generated Key Points

  • Significant progress in audio source separation, particularly in separating vocals, drums, bass, and other (VDBO) stems
  • Existing systems are limited to four-stem setup with inflexible decoder configurations
  • Introduction of Banquet as a stem-agnostic single-decoder system for effectively separating multiple stems while maintaining computational feasibility
  • Banquet achieved comparable performance to complex systems on VDBO stems and outperformed them on guitar and piano separations
  • Ability of Banquet to successfully extract narrow instrument classes such as clean acoustic guitars and less common stems like reeds and organs with only 24.9 million trainable parameters
  • Experiments on the MoisesDB dataset showed that Banquet's query-based setup enables fine-level stem separations beyond traditional VDBO categories
  • Availability of Banquet's implementation for further exploration and application in audio processing tasks at https://github.com/kwatcharasupat/query-bandit
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Karn N. Watcharasupat, Alexander Lerch

Submitted to the 25th International Society for Music Information Retrieval Conference (ISMIR 2024)
License: CC BY-NC-SA 4.0

Abstract: Despite significant recent progress across multiple subtasks of audio source separation, few music source separation systems support separation beyond the four-stem vocals, drums, bass, and other (VDBO) setup. Of the very few current systems that support source separation beyond this setup, most continue to rely on an inflexible decoder setup that can only support a fixed pre-defined set of stems. Increasing stem support in these inflexible systems correspondingly requires increasing computational complexity, rendering extensions of these systems computationally infeasible for long-tail instruments. In this work, we propose Banquet, a system that allows source separation of multiple stems using just one decoder. A bandsplit source separation model is extended to work in a query-based setup in tandem with a music instrument recognition PaSST model. On the MoisesDB dataset, Banquet, at only 24.9 M trainable parameters, approached the performance level of the significantly more complex 6-stem Hybrid Transformer Demucs on VDBO stems and outperformed it on guitar and piano. The query-based setup allows for the separation of narrow instrument classes such as clean acoustic guitars, and can be successfully applied to the extraction of less common stems such as reeds and organs. Implementation is available at https://github.com/kwatcharasupat/query-bandit.

Submitted to arXiv on 26 Jun. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2406.18747v1

, , , , In the field of audio source separation, significant progress has been made in recent years. One area of focus has been on separating vocals, drums, bass, and other (VDBO) stems. However, most existing systems are limited to this four-stem setup and rely on inflexible decoder configurations that cannot handle a variable number of stems. To address this limitation, a new system called Banquet has been proposed. is a stem-agnostic single-decoder system that can effectively separate multiple stems while maintaining computational feasibility. It employs a bandsplit source separation model and a music instrument recognition PaSST model in a query-based setup. Remarkably, Banquet achieved comparable performance to more complex systems on VDBO stems and even outperformed them on guitar and piano separations. One key advantage of Banquet is its ability to successfully extract narrow instrument classes such as clean acoustic guitars and less common stems like reeds and organs. This flexibility is made possible by the use of only 24.9 million trainable parameters, making it computationally efficient compared to other systems. Experiments conducted on the MoisesDB dataset demonstrated that Banquet's query-based setup enables the extraction of fine-level stem separations beyond traditional VDBO categories. This opens up possibilities for separating various instruments with high precision without compromising computational efficiency. Overall, Banquet represents a significant advancement in music source separation technology and its implementation is publicly available for further exploration and application in audio processing tasks at https://github.com/kwatcharasupat/query-bandit.
Created on 15 Mar. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.