Inference-Time Scaling for Diffusion Models beyond Scaling Denoising Steps

AI-generated keywords: Diffusion Models

AI-generated Key Points

  • Researchers studied inference-time scaling behavior of diffusion models to enhance generation performance
  • Diffusion models allow adjusting computation through denoising steps, with performance gains plateauing after a threshold
  • Introduced a search problem to identify better noises for diffusion sampling process
  • Structured design space along two axes: verifiers used for feedback and algorithms for finding improved noise candidates
  • Evaluation conducted on DrawBench and T2I-CompBench datasets for text-to-image model performance
  • FLUX.1-dev model used as backbone, employing supervised verifiers like Aesthetic Score Predictor, CLIPScore, and ImageReward
  • Increasing inference-time compute enhances sample quality in diffusion models, especially in complex image generation tasks
  • Self-supervised verifiers less effective in text-to-image settings due to focus on visual quality over textual information
  • Metrics from DrawBench and LLM used for comprehensive evaluation
  • Study highlights how increased computation during inference can improve sample quality in diffusion models for text-to-image tasks
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Nanye Ma, Shangyuan Tong, Haolin Jia, Hexiang Hu, Yu-Chuan Su, Mingda Zhang, Xuan Yang, Yandong Li, Tommi Jaakkola, Xuhui Jia, Saining Xie

License: CC BY 4.0

Abstract: Generative models have made significant impacts across various domains, largely due to their ability to scale during training by increasing data, computational resources, and model size, a phenomenon characterized by the scaling laws. Recent research has begun to explore inference-time scaling behavior in Large Language Models (LLMs), revealing how performance can further improve with additional computation during inference. Unlike LLMs, diffusion models inherently possess the flexibility to adjust inference-time computation via the number of denoising steps, although the performance gains typically flatten after a few dozen. In this work, we explore the inference-time scaling behavior of diffusion models beyond increasing denoising steps and investigate how the generation performance can further improve with increased computation. Specifically, we consider a search problem aimed at identifying better noises for the diffusion sampling process. We structure the design space along two axes: the verifiers used to provide feedback, and the algorithms used to find better noise candidates. Through extensive experiments on class-conditioned and text-conditioned image generation benchmarks, our findings reveal that increasing inference-time compute leads to substantial improvements in the quality of samples generated by diffusion models, and with the complicated nature of images, combinations of the components in the framework can be specifically chosen to conform with different application scenario.

Submitted to arXiv on 16 Jan. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2501.09732v1

, , , , In this study, the researchers delve into the inference-time scaling behavior of diffusion models, particularly focusing on how increased computation can enhance generation performance. Unlike Large Language Models (LLMs), diffusion models offer the flexibility to adjust inference-time computation through denoising steps, with performance gains plateauing after a certain threshold. To explore this further, the researchers introduce a search problem aimed at identifying better noises for the diffusion sampling process. They structure the design space along two axes: the verifiers used to provide feedback and the algorithms employed to find improved noise candidates. The evaluation is conducted on two datasets: DrawBench and T2I-CompBench, which assess text-to-image models' ability to handle complex prompts and generate high-quality images. The FLUX.1-dev model serves as the backbone for this study, representing state-of-the-art text-conditioned diffusion models. Various supervised verifiers are utilized to evaluate different aspects of generated images, including Aesthetic Score Predictor, CLIPScore, and ImageReward. Additionally, a Verifier Ensemble is created by combining these verifiers to expand evaluation capacity. The researchers find that increasing inference-time compute significantly enhances sample quality in diffusion models, especially in complex image generation tasks. Self-supervised verifiers are found to be less effective in text-to-image settings due to their focus on visual quality over textual information. Metrics from DrawBench are used alongside an LLM as a neutral evaluator for comprehensive evaluation. Overall, this study sheds light on how increased computation during inference can lead to substantial improvements in sample quality in diffusion models for text-to-image generation tasks. By leveraging a combination of verifiers and metrics tailored to specific evaluation needs, the researchers provide valuable insights into optimizing model performance in complex image generation scenarios.
Created on 22 Jan. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.