Running summarizing tools on a new article

This is the first time this article is requested and our AI summarizing tools have never been run on it. We can run our tools now if you click on the button "Run" donw the page but first make sure that it is the right article.

Long-form music generation with latent diffusion

Zach Evans, Julian D. Parker, CJ Carr, Zack Zukowski, Josiah Taylor, Jordi Pons

arXiv: 2404.10301v1 - DOI (cs.SD)

License: CC BY 4.0

Abstract: Audio-based generative models for music have seen great strides recently, but so far have not managed to produce full-length music tracks with coherent musical structure. We show that by training a generative model on long temporal contexts it is possible to produce long-form music of up to 4m45s. Our model consists of a diffusion-transformer operating on a highly downsampled continuous latent representation (latent rate of 21.5Hz). It obtains state-of-the-art generations according to metrics on audio quality and prompt alignment, and subjective tests reveal that it produces full-length music with coherent structure.

Submitted to arXiv on 16 Apr. 2024