Probing the phonetic and phonological knowledge of tones in Mandarin TTS models

AI-generated keywords: TTS models Mandarin Coarticulation Sandhi Evaluation

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Study investigates phonetic and phonological knowledge of lexical tones in Mandarin TTS models
  • Two experiments conducted using controlled stimuli to test tonal coarticulation and tone sandhi
  • Baseline Tacotron 2 model and Tacotron 2 with BERT embeddings effectively capture surface tonal coarticulation patterns
  • Struggle to consistently apply Tone-3 sandhi rule to novel sentences
  • Incorporation of pre-trained BERT embeddings into Tacotron 2 leads to improvements in naturalness and prosody performance
  • Better generalization of Tone-3 sandhi rules to complex novel sentences, but overall accuracy remains low
  • TTS models can generate and validate specific linguistic hypotheses, but linguistically informed stimuli should be included for improved accuracy.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Jian Zhu

Submitted to Speech Prosody 2020

Abstract: This study probes the phonetic and phonological knowledge of lexical tones in TTS models through two experiments. Controlled stimuli for testing tonal coarticulation and tone sandhi in Mandarin were fed into Tacotron 2 and WaveGlow to generate speech samples, which were subject to acoustic analysis and human evaluation. Results show that both baseline Tacotron 2 and Tacotron 2 with BERT embeddings capture the surface tonal coarticulation patterns well but fail to consistently apply the Tone-3 sandhi rule to novel sentences. Incorporating pre-trained BERT embeddings into Tacotron 2 improves the naturalness and prosody performance, and yields better generalization of Tone-3 sandhi rules to novel complex sentences, although the overall accuracy for Tone-3 sandhi was still low. Given that TTS models do capture some linguistic phenomena, it is argued that they can be used to generate and validate certain linguistic hypotheses. On the other hand, it is also suggested that linguistically informed stimuli should be included in the training and the evaluation of TTS models.

Submitted to arXiv on 23 Dec. 2019

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 1912.10915v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

This study investigates the phonetic and phonological knowledge of lexical tones in Text-to-Speech (TTS) models for Mandarin through two experiments. The researchers used controlled stimuli to test tonal coarticulation and tone sandhi, which were then fed into Tacotron 2 and WaveGlow to generate speech samples. These samples were subjected to acoustic analysis and human evaluation. The results reveal that both the baseline Tacotron 2 model and Tacotron 2 with BERT embeddings effectively capture the surface tonal coarticulation patterns. However, they struggle to consistently apply the Tone-3 sandhi rule to novel sentences. To address this limitation, the researchers incorporated pre-trained BERT embeddings into Tacotron 2, leading to improvements in naturalness and prosody performance. This modification also resulted in better generalization of Tone-3 sandhi rules to complex novel sentences, although the overall accuracy for Tone-3 sandhi remained low. Based on these findings, it is argued that TTS models can capture certain linguistic phenomena and be utilized to generate and validate specific linguistic hypotheses. It is recommended that linguistically informed stimuli should be included during both training and evaluation processes of TTS models in order to further improve their accuracy. In conclusion, this study sheds light on the phonetic and phonological aspects of tones in Mandarin TTS models by exploring tonal coarticulation and tone sandhi. It identifies areas where these models excel as well as areas where further improvement is needed.
Created on 17 Oct. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.