Melody transcription via generative pre-training

AI-generated keywords: Music Information Retrieval Melody Transcription Jukebox Generative Pre-training Sheet Sage

AI-generated Key Points

  • Melody transcription in music information retrieval is challenging due to diverse audio, instrument ensembles, and musical styles.
  • Leveraging representations from Jukebox, a generative model of broad music audio, leads to a 20% improvement in melody transcription performance.
  • A new dataset with 50 hours of melody transcriptions obtained through crowdsourced annotations addresses the lack of sufficient training data.
  • Generative pre-training combined with the new dataset results in a significant 77% enhancement in melody transcription accuracy.
  • The Sheet Sage system integrates beat detection, key estimation, and chord recognition to transcribe human-readable lead sheets directly from music audio.
  • The research by Chris Donahue, John Thickstun, and Percy Liang on melody transcription via generative pre-training was published at ISMIR 2022.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Chris Donahue, John Thickstun, Percy Liang

Published as a conference paper at ISMIR 2022
License: CC BY 4.0

Abstract: Despite the central role that melody plays in music perception, it remains an open challenge in music information retrieval to reliably detect the notes of the melody present in an arbitrary music recording. A key challenge in melody transcription is building methods which can handle broad audio containing any number of instrument ensembles and musical styles - existing strategies work well for some melody instruments or styles but not all. To confront this challenge, we leverage representations from Jukebox (Dhariwal et al. 2020), a generative model of broad music audio, thereby improving performance on melody transcription by $20$% relative to conventional spectrogram features. Another obstacle in melody transcription is a lack of training data - we derive a new dataset containing $50$ hours of melody transcriptions from crowdsourced annotations of broad music. The combination of generative pre-training and a new dataset for this task results in $77$% stronger performance on melody transcription relative to the strongest available baseline. By pairing our new melody transcription approach with solutions for beat detection, key estimation, and chord recognition, we build Sheet Sage, a system capable of transcribing human-readable lead sheets directly from music audio. Audio examples can be found at https://chrisdonahue.com/sheetsage and code at https://github.com/chrisdonahue/sheetsage .

Submitted to arXiv on 04 Dec. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2212.01884v1

In the field of music information retrieval, detecting the notes of a melody in a music recording remains a challenging task. Existing strategies for melody transcription struggle to handle diverse audio containing various instrument ensembles and musical styles. To address this challenge, a new approach leverages representations from Jukebox, a generative model of broad music audio. This results in a 20% improvement in melody transcription performance compared to traditional spectrogram features. One major obstacle in melody transcription is the lack of sufficient training data. To overcome this limitation, a new dataset comprising 50 hours of melody transcriptions obtained through crowdsourced annotations of diverse music was created. The combination of generative pre-training and the new dataset led to a significant 77% enhancement in melody transcription accuracy compared to existing methods. Furthermore, by integrating solutions for beat detection, key estimation, and chord recognition, the Sheet Sage system was developed. This innovative system is capable of transcribing human-readable lead sheets directly from music audio. By pairing advanced melody transcription techniques with other music analysis tools, Sheet Sage offers a comprehensive solution for converting complex music recordings into easily interpretable musical notations. The research conducted by Chris Donahue, John Thickstun, and Percy Liang on melody transcription via generative pre-training has been published as a conference paper at ISMIR 2022. The study compares the proposed approach with baseline methods incorporating note segmentation heuristics and zero-shot models trained on related tasks like vocal transcription. Additionally, efforts are made to address the data deficit in melody transcription tasks through innovative methodologies and datasets creation.
Created on 15 Sep. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.