YORC: Yoruba Reading Comprehension dataset

AI-generated keywords: YORC Yorùbá language reading comprehension cross-lingual transfer large language models

AI-generated Key Points

  • Introduction of YORC (Yorùbá Reading Comprehension), a new dataset for Yorùbá language reading comprehension
  • Dataset based on Yorùbá high-school reading comprehension examinations
  • Baseline results using cross-lingual transfer with English RACE dataset and pre-trained encoder-only model
  • Evaluation of large language models (LLMs) like GPT-4
  • GPT-4 achieves highest accuracy of 36.14% on YORC data, but lower compared to AfroXLMR-base and ChatGPT on English test set
  • Challenges faced by LLMs in multi-choice QA setting for under-resourced African languages like Yorùbá
  • Limitations of LLMs for under-resourced African languages emphasized
  • Future work includes evaluation in few-shot settings and exploring approaches to adapt existing models with limited examples
  • Acknowledgment of Mr. Daud Olamide Abolade for assistance with manual text extraction using OCR tools
  • Gratitude expressed to OpenAI for providing API credits through Researcher Access API program for evaluating GPT-3.5 and GPT-4 LLMs
  • Overall contribution in creating a new reading comprehension dataset for Yorùbá language and highlighting challenges and potential future directions in improving performance for under-resourced languages using LLMs.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Anuoluwapo Aremu, Jesujoba O. Alabi, David Ifeoluwa Adelani

License: CC BY 4.0

Abstract: In this paper, we create YORC: a new multi-choice Yoruba Reading Comprehension dataset that is based on Yoruba high-school reading comprehension examination. We provide baseline results by performing cross-lingual transfer using existing English RACE dataset based on a pre-trained encoder-only model. Additionally, we provide results by prompting large language models (LLMs) like GPT-4.

Submitted to arXiv on 18 Aug. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2308.09768v2

In this paper, the authors introduce YORC (Yorùbá Reading Comprehension), a new dataset for Yorùbá language reading comprehension. The dataset is based on Yorùbá high-school reading comprehension examinations. The authors provide baseline results by performing cross-lingual transfer using the existing English RACE dataset and a pre-trained encoder-only model. They also evaluate the performance of large language models (LLMs) like GPT-4. The results show that GPT-4 achieves the highest accuracy of 36.14% on the YORC data. However, this accuracy is still lower compared to AfroXLMR-base and ChatGPT on the English test set, highlighting the challenges faced by pre-trained LLMs in accurately answering questions in a multi-choice QA setting. The paper concludes by emphasizing the limitations of LLMs for under-resourced African languages like Yorùbá. As future work, the authors plan to extend their evaluation to few-shot settings and explore approaches that can effectively adapt existing reading comprehension models with limited examples. The authors acknowledge Mr. Daud Olamide Abolade for his assistance with manual text extraction using OCR tools and express gratitude to OpenAI for providing API credits through their Researcher Access API program for evaluating GPT-3.5 and GPT-4 large language models. Overall, this paper presents an important contribution in creating a new reading comprehension dataset for Yorùbá language and highlights the challenges and potential future directions in improving performance for under-resourced languages using LLMs.
Created on 01 Feb. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.