Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models

AI-generated keywords: Edge-ASR Low-Bit Quantization Automatic Speech Recognition Resource-Constrained Edge Devices Post-Training Quantization

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Recent advancements in ASR have shown high accuracy and reliability in applications like live transcription and voice command processing.
  • Deploying ASR models on resource-constrained edge devices poses challenges due to limitations in memory, computing power, and energy consumption.
  • Post-training quantization (PTQ) is highlighted as a solution for reducing model size and inference costs without retraining.
  • The performance implications of different advanced quantization methods and bit-width configurations on ASR models are not fully understood.
  • Researchers conducted a thorough evaluation by benchmarking eight PTQ techniques on Whisper and Moonshine edge-ASR model families across seven diverse datasets.
  • Analysis focused on the impact of quantization on both weights and activations within the models to understand efficiency versus accuracy trade-offs.
  • Results showed that even 3-bit quantization can be successful with advanced PTQ techniques on high-capacity models for low-power edge devices.
  • The study contributes significantly to enhancing the efficiency and effectiveness of speech recognition systems in real-world applications.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Chen Feng, Yicheng Lin, Shaojie Zhuo, Chenzheng Su, Ramchalam Kinattinkara Ramakrishnan, Zhaocong Yuan, Xiaopeng Zhang

License: CC BY-NC-ND 4.0

Abstract: Recent advances in Automatic Speech Recognition (ASR) have demonstrated remarkable accuracy and robustness in diverse audio applications, such as live transcription and voice command processing. However, deploying these models on resource-constrained edge devices (e.g., IoT device, wearables) still presents substantial challenges due to strict limits on memory, compute and power. Quantization, particularly Post-Training Quantization (PTQ), offers an effective way to reduce model size and inference cost without retraining. Despite its importance, the performance implications of various advanced quantization methods and bit-width configurations on ASR models remain unclear. In this work, we present a comprehensive benchmark of eight state-of-the-art (SOTA) PTQ methods applied to two leading edge-ASR model families, Whisper and Moonshine. We systematically evaluate model performances (i.e., accuracy, memory I/O and bit operations) across seven diverse datasets from the open ASR leader-board, analyzing the impact of quantization and various configurations on both weights and activations. Built on an extension of the LLM compression toolkit, our framework integrates edge-ASR models, diverse advanced quantization algorithms, a unified calibration and evaluation data pipeline, with detailed analysis tools. Our results characterize the trade-offs between efficiency and accuracy, demonstrating that even $3$-bit quantization can succeed on high capacity models when using advanced PTQ techniques. These findings provide valuable insights for optimizing ASR models on low-power, always-on edge devices.

Submitted to arXiv on 10 Jul. 2025

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2507.07877v2

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In their paper titled "Edge-ASR: Towards Low-Bit Quantization of Automatic Speech Recognition Models," authors Chen Feng, Yicheng Lin, Shaojie Zhuo, Chenzheng Su, Ramchalam Kinattinkara Ramakrishnan, Zhaocong Yuan, and Xiaopeng Zhang delve into the realm of technology. They highlight recent advancements in ASR that have showcased impressive accuracy and reliability across various audio applications like live transcription and voice command processing. Despite these achievements, deploying ASR models on resource-constrained edge devices such as IoT devices and wearables remains challenging due to limitations in memory, computing power, and energy consumption. The authors emphasize the significance of as a viable solution for reducing model size and inference costs without the need for retraining. However, the performance implications of different advanced quantization methods and bit-width configurations on ASR models are not fully understood. To address this gap in knowledge, the researchers conduct a thorough evaluation by benchmarking eight state-of-the-art PTQ techniques on two prominent edge-ASR model families known as Whisper and Moonshine. Their study involves a systematic assessment of model performances in terms of accuracy, memory I/O operations, and bit operations across seven diverse datasets sourced from the open ASR leaderboard. By analyzing the impact of quantization on both weights and activations within the models, the authors aim to shed light on the trade-offs between efficiency and accuracy. Leveraging an extension of the LLM compression toolkit, along with a unified calibration and evaluation data pipeline equipped with detailed analysis tools. The results obtained from their research showcase how even 3-bit quantization can yield successful outcomes on high-capacity models when coupled with advanced PTQ techniques. These findings offer valuable insights for optimizing ASR models specifically tailored for low-power always-on edge devices. By providing a comprehensive exploration of cutting-edge quantization methods applied to ASR technology, this study contributes significantly to enhancing the efficiency and effectiveness of speech recognition systems in real-world applications.
Created on 22 Aug. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.