On TasNet for Low-Latency Single-Speaker Speech Enhancement

AI-generated keywords: TasNet

AI-generated Key Points

  • TasNet is a time-domain audio separation network used for single-speaker speech enhancement.
  • TasNet improves state-of-the-art performance in speech enhancement by separating target and noise signal components effectively.
  • TasNet excels at separating interfering speech signals from noise due to its ability to learn an efficient inner-domain representation.
  • Potential issues with large frame hops can affect TasNet's performance due to aliasing problems.
  • Experimental simulations using speech signals contaminated by additive noise evaluate TasNet's effectiveness, with metrics like STOI, PESQ, and Scale-Invariant SDR used for assessment.
  • TasNet is tested as a 2-speaker speech separation system using the WSJ0 speech corpus for training data and mean STOI, PESQ, and SI-SDR metrics for evaluation.
  • TasNet shows promise for low-latency single-speaker speech enhancement applications by effectively separating target speech from various noise sources like modulated noise and interfering speech signals.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Morten Kolbæk, Zheng-Hua Tan, Søren Holdt Jensen, Jesper Jensen

License: CC BY 4.0

Abstract: In recent years, speech processing algorithms have seen tremendous progress primarily due to the deep learning renaissance. This is especially true for speech separation where the time-domain audio separation network (TasNet) has led to significant improvements. However, for the related task of single-speaker speech enhancement, which is of obvious importance, it is yet unknown, if the TasNet architecture is equally successful. In this paper, we show that TasNet improves state-of-the-art also for speech enhancement, and that the largest gains are achieved for modulated noise sources such as speech. Furthermore, we show that TasNet learns an efficient inner-domain representation, where target and noise signal components are highly separable. This is especially true for noise in terms of interfering speech signals, which might explain why TasNet performs so well on the separation task. Additionally, we show that TasNet performs poorly for large frame hops and conjecture that aliasing might be the main cause of this performance drop. Finally, we show that TasNet consistently outperforms a state-of-the-art single-speaker speech enhancement system.

Submitted to arXiv on 27 Mar. 2021

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2103.14882v1

, , , , The paper "On TasNet for Low-Latency Single-Speaker Speech Enhancement" delves into the use of TasNet, a time-domain audio separation network, for single-speaker speech enhancement. The study demonstrates that TasNet, known for its success in speech separation tasks, also improves state-of-the-art performance in speech enhancement. Notably, it excels at separating target and noise signal components from modulated noise sources such as speech. This is attributed to TasNet's ability to learn an efficient inner-domain representation, particularly in separating interfering speech signals from noise. However, the study also highlights potential issues with large frame hops that can affect TasNet's performance due to aliasing problems. To evaluate TasNet's effectiveness as an enhancement system, experimental simulations are conducted using speech signals contaminated by additive noise. Performance metrics such as STOI, PESQ, and Scale-Invariant SDR are used to assess its performance. Additionally, TasNet is tested as a 2-speaker speech separation system to validate its implementation against existing literature results. The WSJ0 speech corpus is used for training data and mean STOI, PESQ, and SI-SDR metrics are used to evaluate performance. Overall, the findings suggest that TasNet shows promise for low-latency single-speaker speech enhancement applications by effectively separating target speech from various noise sources like modulated noise and interfering speech signals. Future research may focus on addressing limitations related to large frame hops to enhance the overall efficiency of TasNet in real-world scenarios.
Created on 10 Feb. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.