TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

AI-generated keywords: TVM automated end-to-end optimizing compiler deep learning performance portability hardware back-ends

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • TVM is an open-source framework for machine learning optimization.
  • It offers graph-level and operator-level optimizations, unlike current frameworks that rely on vendor-specific libraries.
  • TVM addresses the challenge of deploying workloads to various hardware devices such as mobile phones, embedded devices, FPGAs, and ASICs.
  • It incorporates high-level operator fusion, mapping to hardware primitives, and memory latency hiding techniques for deep learning optimization challenges.
  • TVM uses a learning-based cost modeling method to automate low-level program optimization based on hardware characteristics.
  • Experimental results show competitive performance across different hardware back-ends including CPUs, mobile GPUs, and server-class GPUs.
  • TVM can target new accelerator back-ends like FPGA-based deep learning accelerators.
  • The team behind TVM includes authors Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu,Luis Ceze,C arlos Guestrin,and Arvind Krishnamurthy.
  • Their work on TVM represents a significant advancement in automated optimization for deep learning compilers.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, Arvind Krishnamurthy

Significantly improved version, add automated optimization

Abstract: There is an increasing need to bring machine learning to a wide diversity of hardware devices. Current frameworks rely on vendor-specific operator libraries and optimize for a narrow range of server-class GPUs. Deploying workloads to new platforms -- such as mobile phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) -- requires significant manual effort. We propose TVM, a compiler that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends. TVM solves optimization challenges specific to deep learning, such as high-level operator fusion, mapping to arbitrary hardware primitives, and memory latency hiding. It also automates optimization of low-level programs to hardware characteristics by employing a novel, learning-based cost modeling method for rapid exploration of code optimizations. Experimental results show that TVM delivers performance across hardware back-ends that are competitive with state-of-the-art, hand-tuned libraries for low-power CPU, mobile GPU, and server-class GPUs. We also demonstrate TVM's ability to target new accelerator back-ends, such as the FPGA-based generic deep learning accelerator. The system is open sourced and in production use inside several major companies.

Submitted to arXiv on 12 Feb. 2018

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 1802.04799v3

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

TVM is an for , providing a solution to the growing demand for bringing machine learning to a wide range of hardware devices. Unlike current frameworks that rely on vendor-specific operator libraries and are optimized for specific server-class GPUs, TVM offers by exposing graph-level and operator-level optimizations. It addresses the challenge of deploying workloads to new platforms such as mobile phones, embedded devices, and accelerators like FPGAs and ASICs. Specifically designed for deep learning optimization challenges, TVM incorporates high-level operator fusion, mapping to arbitrary hardware primitives, and memory latency hiding techniques. It also utilizes a novel learning-based cost modeling method to automate the optimization of low-level programs based on hardware characteristics. This allows for rapid exploration of code optimizations. Experimental results have shown that TVM delivers competitive performance across various hardware back-ends including low-power CPUs, mobile GPUs, and server-class GPUs. It has also demonstrated its versatility by targeting new accelerator back-ends such as FPGA-based generic deep learning accelerators. The system is open-sourced and already in production use within several major companies. The team behind TVM includes authors Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Meghan Cowan, Haichen Shen, Leyuan Wang, Yuwei Hu,Luis Ceze,C arlos Guestrin,and Arvind Krishnamurthy. Their work on TVM represents a significant advancement in automated optimization for deep learning compilers.
Created on 18 Jul. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.