Mix and Match: A Novel FPGA-Centric Deep Neural Network Quantization Framework
AI-generated Key Points
- The paper proposes a novel FPGA-centric DNN quantization framework for efficient DNN inference engine on FPGA devices.
- Different quantization schemes are applied for different rows of the weight matrix to achieve better utilization of heterogeneous FPGA hardware resources.
- A hardware-friendly quantization scheme named sum-of-power-of-2 (SP2) is proposed for Gaussian-like weight distribution, while fixed-point quantization is suitable for Uniform-like weight distribution.
- An intra-layer multi-scheme quantization framework with an ensemble of SP2 and fixed-point schemes is proposed to fully explore the FPGA resources and maintain or even increase accuracy due to better matching with weight distributions.
- The authors evaluate their framework across multiple application domains with various DNNs such as CNN and RNN, achieving performance improvement of 2.1×−4.1× compared to solely exploiting DSPs for all multiplication operations.
- This research contributes to addressing the critical step of model compression required to deploy DNN models on edge devices while maintaining or even improving accuracy.
- The proposed MSQ approach offers a hardware-friendly solution that enables efficient implementation of DNN inference on edge computing platforms such as ASICs, FPGAs, and embedded systems.
Authors: Sung-En Chang, Yanyu Li, Mengshu Sun, Runbin Shi, Hayden K. -H. So, Xuehai Qian, Yanzhi Wang, Xue Lin
Abstract: Deep Neural Networks (DNNs) have achieved extraordinary performance in various application domains. To support diverse DNN models, efficient implementations of DNN inference on edge-computing platforms, e.g., ASICs, FPGAs, and embedded systems, are extensively investigated. Due to the huge model size and computation amount, model compression is a critical step to deploy DNN models on edge devices. This paper focuses on weight quantization, a hardware-friendly model compression approach that is complementary to weight pruning. Unlike existing methods that use the same quantization scheme for all weights, we propose the first solution that applies different quantization schemes for different rows of the weight matrix. It is motivated by (1) the distribution of the weights in the different rows are not the same; and (2) the potential of achieving better utilization of heterogeneous FPGA hardware resources. To achieve that, we first propose a hardware-friendly quantization scheme named sum-of-power-of-2 (SP2) suitable for Gaussian-like weight distribution, in which the multiplication arithmetic can be replaced with logic shifter and adder, thereby enabling highly efficient implementations with the FPGA LUT resources. In contrast, the existing fixed-point quantization is suitable for Uniform-like weight distribution and can be implemented efficiently by DSP. Then to fully explore the resources, we propose an FPGA-centric mixed scheme quantization (MSQ) with an ensemble of the proposed SP2 and the fixed-point schemes. Combining the two schemes can maintain, or even increase accuracy due to better matching with weight distributions.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
Welcome to our AI assistant! Here are some important things to keep in mind:
- The assistant will only answer questions related to this specific paper.
- Please note that this is not a bot for casual chatting.
- If you want the answer in a language other than the language you chose for navigating the website, simply add "TRANSLATE IN LANGUAGE L" at the end of your query (replace "LANGUAGE L" with the language of your choice).
- For example, you could ask "Can you extract the most important aspect of the paper? TRANSLATE IN SPANISH".
- If you want to keep the history of your questions/answers you should create an account.
Assess the quality of the AI-generated content by voting
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
Similar papers summarized with our AI tools
Navigate through even more similar papers through atree representation
Look for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.