Filler Word Detection and Classification: A Dataset and Benchmark

AI-generated keywords: Filler Word Detection

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Authors Ge Zhu, Juan-Pablo Caceres, and Justin Salamon address identifying and categorizing filler words in speech recordings
  • Introduce PodcastFillers dataset with 35K annotated filler words and 50K annotations of other common sounds in podcasts
  • Propose a pipeline combining Voice Activity Detection (VAD) and Automatic Speech Recognition (ASR) for identifying filler candidates
  • Evaluate pipeline on PodcastFillers, showing ASR significantly improves detection accuracy
  • Achieve state-of-the-art results in detecting and classifying filler words
  • Make PodcastFillers publicly available to establish a benchmark for future research
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Ge Zhu, Juan-Pablo Caceres, Justin Salamon

Submitted to Insterspeech 2022
License: CC BY-NC-ND 4.0

Abstract: Filler words such as `uh' or `um' are sounds or words people use to signal they are pausing to think. Finding and removing filler words from recordings is a common and tedious task in media editing. Automatically detecting and classifying filler words could greatly aid in this task, but few studies have been published on this problem. A key reason is the absence of a dataset with annotated filler words for training and evaluation. In this work, we present a novel speech dataset, PodcastFillers, with 35K annotated filler words and 50K annotations of other sounds that commonly occur in podcasts such as breaths, laughter, and word repetitions. We propose a pipeline that leverages VAD and ASR to detect filler candidates and a classifier to distinguish between filler word types. We evaluate our proposed pipeline on PodcastFillers, compare to several baselines, and present a detailed ablation study. In particular, we evaluate the importance of using ASR and how it compares to a transcription-free approach resembling keyword spotting. We show that our pipeline obtains state-of-the-art results, and that leveraging ASR strongly outperforms a keyword spotting approach. We make PodcastFillers publicly available, and hope our work serves as a benchmark for future research.

Submitted to arXiv on 28 Mar. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2203.15135v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

, , , , In their work titled "Filler Word Detection and Classification: A Dataset and Benchmark," authors Ge Zhu, Juan-Pablo Caceres, and Justin Salamon address the issue of identifying and categorizing filler words like 'uh' or 'um' in speech recordings. These filler words are commonly used as pauses during speech, making their removal a tedious task in media editing. The authors highlight the lack of research in this area due to the absence of a comprehensive dataset with annotated filler words for training and evaluation. To bridge this gap, the authors introduce a new speech dataset called PodcastFillers, containing 35K annotated filler words along with 50K annotations of other common sounds found in podcasts such as breaths, laughter, and word repetitions. They propose a pipeline that combines Voice Activity Detection (VAD) and Automatic Speech Recognition (ASR) to identify potential filler candidates and a classifier to differentiate between different types of filler words. The authors evaluate their pipeline on PodcastFillers, comparing it against several baseline methods and conducting a detailed ablation study to assess the impact of using ASR compared to transcription-free approaches like keyword spotting. Their results demonstrate that leveraging ASR significantly improves detection accuracy, outperforming keyword spotting techniques. The proposed pipeline achieves state-of-the-art results in detecting and classifying filler words. By making PodcastFillers publicly available, the authors aim to establish a benchmark for future research in this field. Their work not only contributes to enhancing media editing processes by automating filler word detection but also sheds light on the importance of utilizing ASR technology for improved performance in speech analysis tasks. This study serves as a valuable resource for researchers interested in developing more efficient methods for identifying and managing filler words in audio recordings.
Created on 08 Dec. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.