Lumiere: A Space-Time Diffusion Model for Video Generation

AI-generated keywords: Lumiere text-to-video diffusion model Space-Time U-Net architecture global temporal consistency pre-trained text-to-image diffusion model

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Lumiere is a cutting-edge text-to-video diffusion model
  • Key contribution: Space-Time U-Net architecture for global temporal consistency
  • Incorporates spatial and temporal down- and up-sampling techniques
  • Can generate full-frame-rate, low-resolution videos
  • Demonstrates state-of-the-art results in text-to-video generation
  • Versatile in various content creation tasks and video editing applications
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Omer Bar-Tal, Hila Chefer, Omer Tov, Charles Herrmann, Roni Paiss, Shiran Zada, Ariel Ephrat, Junhwa Hur, Yuanzhen Li, Tomer Michaeli, Oliver Wang, Deqing Sun, Tali Dekel, Inbar Mosseri

Webpage: https://lumiere-video.github.io/ | Video: https://www.youtube.com/watch?v=wxLr02Dz2Sc

Abstract: We introduce Lumiere -- a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion -- a pivotal challenge in video synthesis. To this end, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution -- an approach that inherently makes global temporal consistency difficult to achieve. By deploying both spatial and (importantly) temporal down- and up-sampling and leveraging a pre-trained text-to-image diffusion model, our model learns to directly generate a full-frame-rate, low-resolution video by processing it in multiple space-time scales. We demonstrate state-of-the-art text-to-video generation results, and show that our design easily facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video inpainting, and stylized generation.

Submitted to arXiv on 23 Jan. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2401.12945v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

Lumiere is a cutting-edge text-to-video diffusion model that effectively addresses the challenge of synthesizing videos with realistic and coherent motion. Its key contributions include the innovative Space-Time U-Net architecture, which enables the generation of the entire temporal duration of a video at once, ensuring global temporal consistency. Additionally, Lumiere incorporates spatial and temporal down- and up-sampling techniques to enhance its video generation capabilities. Leveraging a pre-trained text-to-image diffusion model, Lumiere can directly generate full-frame-rate, low-resolution videos by processing them in multiple space-time scales. The authors demonstrate state-of-the-art results in text-to-video generation using Lumiere and showcase its versatility in various content creation tasks and video editing applications such as image-to-video synthesis, video inpainting, and stylized generation.
Created on 26 Jan. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.