InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation
AI-generated Key Points
⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.
- Diffusion models have revolutionized text-to-image generation with their exceptional quality and creativity.
- Previous attempts to improve sampling speed and reduce computational costs through distillation have not resulted in a functional one-step model.
- The paper "InstaFlow" introduces Rectified Flow, including the innovative reflow procedure, to transform Stable Diffusion into an ultra-fast one-step model for text-to-image generation.
- InstaFlow achieves remarkable image quality with an FID of $23.3$ on MS COCO 2017-5k dataset, surpassing the previous state-of-the-art technique by a significant margin.
- Leveraging an expanded network with 1.7B parameters further improves the FID score to $22.4$, showcasing both efficiency and effectiveness in high-quality image synthesis tasks.
- InstaFlow sets a new benchmark for speed in image generation tasks, achieving an outstanding FID of $13.1$ on MS COCO 2014-30k dataset in just $0.09$ seconds.
- Training InstaFlow only requires 199 A100 GPU days, making it powerful and cost-effective for practical implementation.
- Codes and pre-trained models for InstaFlow are available at \url{github.com/gnobitab/InstaFlow}, enabling further exploration and replication of results.
Authors: Xingchao Liu, Xiwen Zhang, Jianzhu Ma, Jian Peng, Qiang Liu
Abstract: Diffusion models have revolutionized text-to-image generation with its exceptional quality and creativity. However, its multi-step sampling process is known to be slow, often requiring tens of inference steps to obtain satisfactory results. Previous attempts to improve its sampling speed and reduce computational costs through distillation have been unsuccessful in achieving a functional one-step model. In this paper, we explore a recent method called Rectified Flow, which, thus far, has only been applied to small datasets. The core of Rectified Flow lies in its \emph{reflow} procedure, which straightens the trajectories of probability flows, refines the coupling between noises and images, and facilitates the distillation process with student models. We propose a novel text-conditioned pipeline to turn Stable Diffusion (SD) into an ultra-fast one-step model, in which we find reflow plays a critical role in improving the assignment between noise and images. Leveraging our new pipeline, we create, to the best of our knowledge, the first one-step diffusion-based text-to-image generator with SD-level image quality, achieving an FID (Frechet Inception Distance) of $23.3$ on MS COCO 2017-5k, surpassing the previous state-of-the-art technique, progressive distillation, by a significant margin ($37.2$ $\rightarrow$ $23.3$ in FID). By utilizing an expanded network with 1.7B parameters, we further improve the FID to $22.4$. We call our one-step models \emph{InstaFlow}. On MS COCO 2014-30k, InstaFlow yields an FID of $13.1$ in just $0.09$ second, the best in $\leq 0.1$ second regime, outperforming the recent StyleGAN-T ($13.9$ in $0.1$ second). Notably, the training of InstaFlow only costs 199 A100 GPU days. Codes and pre-trained models are available at \url{github.com/gnobitab/InstaFlow}.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.