From Sora What We Can See: A Survey of Text-to-Video Generation

AI-generated keywords: Artificial Intelligence Sora Text-to-Video Generation Survey OpenAI

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Significant strides in artificial intelligence towards achieving artificial general intelligence
  • Sora by OpenAI with minute-level world-simulative capabilities as a crucial milestone
  • Challenges faced by Sora that require resolution
  • Survey conducted by authors on Sora within text-to-video generation context
  • Categorization of literature along three dimensions: evolutionary generators, excellent pursuit, and realistic panorama
  • Insights on widely used datasets and metrics in text-to-video generation domain
  • Identification of challenges and open problems, along with proposed avenues for future research and development
  • Comprehensive list for further studies available at authors' repository: https://github.com/soraw-ai/Awesome-Text-to-Video-Generation
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Rui Sun, Yumin Zhang, Tejal Shah, Jiahao Sun, Shuoying Zhang, Wenqi Li, Haoran Duan, Bo Wei, Rajiv Ranjan

A comprehensive list of text-to-video generation studies in this survey is available at https://github.com/soraw-ai/Awesome-Text-to-Video-Generation

Abstract: With impressive achievements made, artificial intelligence is on the path forward to artificial general intelligence. Sora, developed by OpenAI, which is capable of minute-level world-simulative abilities can be considered as a milestone on this developmental path. However, despite its notable successes, Sora still encounters various obstacles that need to be resolved. In this survey, we embark from the perspective of disassembling Sora in text-to-video generation, and conducting a comprehensive review of literature, trying to answer the question, \textit{From Sora What We Can See}. Specifically, after basic preliminaries regarding the general algorithms are introduced, the literature is categorized from three mutually perpendicular dimensions: evolutionary generators, excellent pursuit, and realistic panorama. Subsequently, the widely used datasets and metrics are organized in detail. Last but more importantly, we identify several challenges and open problems in this domain and propose potential future directions for research and development.

Submitted to arXiv on 17 May. 2024

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2405.10674v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

Significant strides have been made in the realm of artificial intelligence towards achieving artificial general intelligence. One notable development in this journey is Sora - a creation by OpenAI with remarkable minute-level world-simulative capabilities, marking a crucial milestone in AI advancement. Despite its impressive successes, Sora faces various challenges that require resolution. Recently, authors Rui Sun, Yumin Zhang, Tejal Shah, Jiahao Sun, Shuoying Zhang, Wenqi Li, Haoran Duan, Bo Wei and Rajiv Ranjan conducted a survey on Sora within the context of text-to-video generation. The survey provides an introduction to general algorithms and categorizes the literature along three dimensions: evolutionary generators, excellent pursuit and realistic panorama. It also offers detailed insights on widely used datasets and metrics in this domain. The survey identifies several challenges and open problems within text-to-video generation and proposes potential avenues for future research and development. For those interested in exploring further studies on text-to-video generation, a comprehensive list is available through the authors' repository at https://github.com/soraw-ai/Awesome-Text-to-Video-Generation. This survey serves as a valuable resource for understanding the current landscape of AI advancements and sheds light on the complexities involved in pushing towards artificial general intelligence through innovations like Sora.
Created on 11 Nov. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.