Learnings from Technological Interventions in a Low Resource Language: Enhancing Information Access in Gondi

AI-generated keywords: Low-resource languages Data collection Technology-driven methods Machine translation Community involvement

AI-generated Key Points

  • Challenges of developing technologies for low-resource languages due to lack of representative data
  • Case study on deploying technology-driven data collection methods for Hindi to Gondi translations
  • Creation of linguistic resources such as dictionaries, children's stories, and an IVR platform for Gondi language
  • Development of a compressed Hindi-Gondi machine translation model for low-resource edge devices
  • Evaluation of the model's effectiveness through assistance to volunteers collecting more data
  • Importance of disseminating Gondi content through IVR systems for wider audience reach
  • Emphasis on community involvement and avoiding commodification of local languages in building language technologies
  • Reflection on the approach to building language technologies and the importance of community perspectives and inclusivity
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Devansh Mehta, Harshita Diddee, Ananya Saxena, Anurag Shukla, Sebastin Santy, Ramaravind Kommiya Mothilal, Brij Mohan Lal Srivastava, Alok Sharma, Vishnu Prasad, Venkanna U, Kalika Bali

In Submission (Revised) to Language Resources and Evaluation Journal. arXiv admin note: text overlap with arXiv:2004.10270
License: CC BY 4.0

Abstract: The primary obstacle to developing technologies for low-resource languages is the lack of representative, usable data. In this paper, we report the deployment of technology-driven data collection methods for creating a corpus of more than 60,000 translations from Hindi to Gondi, a low-resource vulnerable language spoken by around 2.3 million tribal people in south and central India. During this process, we help expand information access in Gondi across 2 different dimensions (a) The creation of linguistic resources that can be used by the community, such as a dictionary, children's stories, Gondi translations from multiple sources and an Interactive Voice Response (IVR) based mass awareness platform; (b) Enabling its use in the digital domain by developing a Hindi-Gondi machine translation model, which is compressed by nearly 4 times to enable it's edge deployment on low-resource edge devices and in areas of little to no internet connectivity. We also present preliminary evaluations of utilizing the developed machine translation model to provide assistance to volunteers who are involved in collecting more data for the target language. Through these interventions, we not only created a refined and evaluated corpus of 26,240 Hindi-Gondi translations that was used for building the translation model but also engaged nearly 850 community members who can help take Gondi onto the internet.

Submitted to arXiv on 29 Nov. 2022

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2211.16172v1

In this paper, the authors discuss the challenges of developing technologies for low-resource languages due to the lack of representative data. They present a case study on deploying technology-driven data collection methods to create a corpus of over 60,000 translations from Hindi to Gondi, a vulnerable language spoken by around 2.3 million tribal people in India. Through this process, they aim to expand information access in Gondi by creating linguistic resources such as dictionaries, children's stories, and an Interactive Voice Response (IVR) platform. Additionally, the authors develop a Hindi-Gondi machine translation model that is compressed for deployment on low-resource edge devices with limited internet connectivity. They evaluate the model's effectiveness by providing assistance to volunteers collecting more data for the target language. The study shows that annotators accept an average of 3.66 out of 5.56 suggested options per sentence translation iteration, indicating the usefulness of the model's suggestions. The authors also highlight the importance of disseminating Gondi content through Interactive Voice Response systems to reach a wider audience. They reflect on their approach to building language technologies and emphasize the need for community involvement and avoiding commodification of local languages. Overall, this work raises questions about outside intervention in developing language technologies and emphasizes the importance of considering community perspectives and inclusivity in standardization efforts. By engaging with local speakers and leveraging technological interventions effectively, there is potential for creating a virtuous cycle where linguistic resources can improve language technologies and vice versa for low-resource languages like Gondi.
Created on 20 Oct. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.