Is Speech Emotion Recognition Language-Independent? Analysis of English and Bangla Languages using Language-Independent Vocal Features

AI-generated keywords: Speech Emotion Recognition Language Independence Emotional Speech Sets Support Vector Machine (SVM) Prosodic Features

AI-generated Key Points

  • Researchers explored Speech Emotion Recognition (SER) to determine language-independence
  • Emotions studied: happiness, anger, neutral, sadness, disgust, fear
  • Three Emotional Speech Sets used: two by Bengali speakers in Bangla and English, one by English speakers from Canada
  • Language-independent prosodic features analyzed with SVM model for classification
  • Three experiments conducted to test hypothesis: evaluating individual speech sets' performance, combining for unified context classification rate measurement, training/testing with different sets for recognition rate assessment
  • SER is predominantly language-independent but discrepancies noted in recognizing disgust and fear across languages
  • Non-native speakers can convey emotions through speech similar to native expression
  • Vocal feature extraction done using Praat software's signal processing techniques
  • Prosodic features like pitch median/mean/standard deviation/intensity captured emotional nuances aiding analysis
  • Study highlights interplay between language/emotion recognition in speech processing systems and cross-linguistic applicability in SER research
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Fardin Saad, Hasan Mahmud, Md. Alamin Shaheen, Md. Kamrul Hasan, Paresha Farastu

9 pages, 7 figures, currently under review in International Journal of Advanced Computer Science and Applications (IJACSA)
License: CC BY 4.0

Abstract: A language agnostic approach to recognizing emotions from speech remains an incomplete and challenging task. In this paper, we used Bangla and English languages to assess whether distinguishing emotions from speech is independent of language. The following emotions were categorized for this study: happiness, anger, neutral, sadness, disgust, and fear. We employed three Emotional Speech Sets, of which the first two were developed by native Bengali speakers in Bangla and English languages separately. The third was the Toronto Emotional Speech Set (TESS), which was developed by native English speakers from Canada. We carefully selected language-independent prosodic features, adopted a Support Vector Machine (SVM) model, and conducted three experiments to carry out our proposition. In the first experiment, we measured the performance of the three speech sets individually. This was followed by the second experiment, where we recorded the classification rate by combining the speech sets. Finally, in the third experiment we measured the recognition rate by training and testing the model with different speech sets. Although this study reveals that Speech Emotion Recognition (SER) is mostly language-independent, there is some disparity while recognizing emotional states like disgust and fear in these two languages. Moreover, our investigations inferred that non-native speakers convey emotions through speech, much like expressing themselves in their native tongue.

Submitted to arXiv on 21 Nov. 2021

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2111.10776v1

In this study, researchers explored the realm of Speech Emotion Recognition (SER) to determine its language-independence. Utilizing Bangla and English languages, they assessed the feasibility of distinguishing emotions from speech regardless of linguistic differences. The emotions under scrutiny included happiness, anger, neutral, sadness, disgust, and fear. Three Emotional Speech Sets were employed for investigation: two crafted by native Bengali speakers in both languages and one developed by native English speakers from Canada. Meticulously selected language-independent prosodic features were analyzed using a Support Vector Machine (SVM) model for classification purposes. Three distinct experiments were conducted to test the hypothesis: evaluating individual speech sets' performance, combining them for unified context classification rate measurement, and training/testing with different sets for comprehensive recognition rate assessment. Findings suggested that SER is predominantly language-independent; however, discrepancies were noted in recognizing disgust and fear across Bangla and English languages. Non-native speakers are capable of conveying emotions through speech similar to their native tongue expression. Detailed insights into methodology revealed vocal feature extraction via Praat software's signal processing techniques. Language-independent prosodic features such as pitch median/mean/standard deviation/intensity captured emotional nuances within speech patterns significantly aiding analysis. This study sheds light on intricate interplay between language/emotion recognition in speech processing systems while underscoring potential cross-linguistic applicability in SER research endeavors.
Created on 25 Sep. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.