K-UniMorph: Korean Universal Morphology and its Feature Schema

AI-generated keywords: K-UniMorph Sejong corpus Korean language morphological schema verb inflection

AI-generated Key Points

  • Authors propose a new Universal Morphology dataset for the Korean language
  • Introduce the K-UniMorph dataset to address underrepresentation of Korean in morphological paradigms
  • Adopt a morphological feature schema from previous works by Sylak-Glassman et al. (2015) and Sylak-Glassman (2016)
  • Extract inflected verb forms from the Sejong morphologically analyzed corpus
  • Focus on annotating morphological data for verbs, separating postpositions from substantive elements
  • Detailed explanations on how to extract inflected verbal forms and grammatical criteria
  • Morphological schema includes four types of verbal endings: sentence final ending (ef), non-final ending (ep), conjunctive ending (ec), and modifier ending (etm)
  • Discuss two grammatical categories: evidentiality and interrogativity, reflected in sentence final endings denoting declarative or interrogative forms
  • Conclude by discussing future perspectives on Korean morphological paradigms and the dataset created
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Eunkyul Leah Jo, Kyuwon Kim, Xihan Wu, KyungTae Lim, Jungyeul Park, Chulwoo Park

Findings of the Association for Computational Linguistics: ACL 2023 (Camera-ready)
License: CC BY 4.0

Abstract: We present in this work a new Universal Morphology dataset for Korean. Previously, the Korean language has been underrepresented in the field of morphological paradigms amongst hundreds of diverse world languages. Hence, we propose this Universal Morphological paradigms for the Korean language that preserve its distinct characteristics. For our K-UniMorph dataset, we outline each grammatical criterion in detail for the verbal endings, clarify how to extract inflected forms, and demonstrate how we generate the morphological schemata. This dataset adopts morphological feature schema from Sylak-Glassman et al. (2015) and Sylak-Glassman (2016) for the Korean language as we extract inflected verb forms from the Sejong morphologically analyzed corpus that is one of the largest annotated corpora for Korean. During the data creation, our methodology also includes investigating the correctness of the conversion from the Sejong corpus. Furthermore, we carry out the inflection task using three different Korean word forms: letters, syllables and morphemes. Finally, we discuss and describe future perspectives on Korean morphological paradigms and the dataset.

Submitted to arXiv on 10 May. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2305.06335v3

In this paper, the authors propose a new Universal Morphology dataset for the Korean language. To address the underrepresentation of Korean in the field of morphological paradigms compared to other languages, they introduce the K-UniMorph dataset which aims to preserve its distinct characteristics. The authors adopt a morphological feature schema from previous works by Sylak-Glassman et al. (2015) and Sylak-Glassman (2016) for the Korean language. They extract inflected verb forms from the Sejong morphologically analyzed corpus - one of the largest annotated corpora for Korean consisting of over 0.6 million sentences and 9.5 million words - and investigate its correctness during data creation. Additionally, they generate morphological schemata based on their findings and focus specifically on annotating morphological data for verbs (V), separating postpositions from substantive elements like noun phrases. The paper provides detailed explanations on how to extract inflected verbal forms and outlines each grammatical criterion in detail for these forms. It also presents a morphological schema for Korean UniMorph that incorporates features from Sylak-Glassman et al. (2015) and Sylak-Glassman (2016). This schema includes four types of verbal endings: sentence final ending (ef), non-final ending (ep), conjunctive ending (ec), and modifier ending (etm). Furthermore, it discusses two grammatical categories: evidentiality and interrogativity which reflect source of information conveyed in a proposition or indicate whether a statement or question is being expressed respectively with different sentence final endings denoting declarative or interrogative forms. The authors conclude by discussing future perspectives on Korean morphological paradigms and the dataset they have created. Overall, this paper presents a comprehensive and detailed approach to creating a Universal Morphology dataset for the Korean language offering valuable insights into its morphological features as well as potential directions for further research in this area.
Created on 01 Sep. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.