Subject-driven Text-to-Image Generation via Apprenticeship Learning
AI-generated Key Points
⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.
- Text-to-image generation has seen significant advancements in recent years
- DreamBooth generates highly customized images of a target subject by fine-tuning an "expert model" for a given subject from a few examples
- This process is expensive as it requires learning a new expert model for each subject
- SuTI is a Subject-driven Text-to-Image generator that replaces subject-specific fine-tuning with in-context learning
- SuTI can generate novel renditions of a new subject in different scenes instantly without any subject-specific optimization
- It is powered by apprenticeship learning, where a single apprentice model learns from data generated by millions of subject-specific expert models mined from image clusters on the internet
- SuTI imitates their behavior to generate high-quality and customized images 20x faster than optimization-based state-of-the art (SoTA) methods
- SuTI outperforms existing models like InstructPix2Pix, Textual Inversion, Imagic, Prompt2Prompt, ReImagen and DreamBooth on challenging benchmarks like DreamBench and DreamBenchv2.
- The human evaluation shows that SuTI significantly outperforms these models on the subject and text alignment aspects.
- Overall, SuTI presents an efficient and effective solution to generating highly customized images of new subjects without requiring expensive fine tuning processes for each individual case.
Authors: Wenhu Chen, Hexiang Hu, Yandong Li, Nataniel Ruiz, Xuhui Jia, Ming-Wei Chang, William W. Cohen
Abstract: Recent text-to-image generation models like DreamBooth have made remarkable progress in generating highly customized images of a target subject, by fine-tuning an ``expert model'' for a given subject from a few examples. However, this process is expensive, since a new expert model must be learned for each subject. In this paper, we present SuTI, a Subject-driven Text-to-Image generator that replaces subject-specific fine tuning with in-context learning. Given a few demonstrations of a new subject, SuTI can instantly generate novel renditions of the subject in different scenes, without any subject-specific optimization. SuTI is powered by apprenticeship learning, where a single apprentice model is learned from data generated by a massive number of subject-specific expert models. Specifically, we mine millions of image clusters from the Internet, each centered around a specific visual subject. We adopt these clusters to train a massive number of expert models, each specializing in a different subject. The apprentice model SuTI then learns to imitate the behavior of these fine-tuned experts. SuTI can generate high-quality and customized subject-specific images 20x faster than optimization-based SoTA methods. On the challenging DreamBench and DreamBench-v2, our human evaluation shows that SuTI significantly outperforms existing models like InstructPix2Pix, Textual Inversion, Imagic, Prompt2Prompt, Re-Imagen and DreamBooth, especially on the subject and text alignment aspects.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.