The SOFC-Exp Corpus and Neural Approaches to Information Extraction in the Materials Science Domain

AI-generated keywords: Information Extraction

AI-generated Key Points

  • Introduction of a novel information extraction task in materials science related to solid oxide fuel cells (SOFCs)
  • Development of an annotation scheme for marking information on materials and measurement conditions in scholarly articles
  • Release of the SOFC-Exp corpus consisting of 45 annotated open-access papers
  • Demonstration of complexity in named entity recognition and slot filling tasks, with high-quality annotations
  • Presentation of strong neural network models based on the new data set, showing performance improvements with BERT embeddings
  • Enhancement of performance by adding a recurrent neural network for increasing task complexity
  • Proposal of competitive baselines for future research in the field using the developed models
  • Promotion of research on challenging information extraction tasks beyond SOFCs by applying findings to other experimental domains
  • Transferability of best model configurations demonstrated through achieving state-of-the-art results on a related corpus from a previous study
  • Contribution to developing an annotation scheme for marking experimental information in materials science publications and providing a publicly available corpus on SOFC-related experiments
  • Identification of sub-tasks for extracting experiment information and offering competitive neural network baselines
  • Situating the work within the broader context of information extraction in scientific publications and materials science, referencing related studies focusing on knowledge base construction and synthesis procedures
  • Discussion on advancements in neural entity tagging and slot filling models informing methodology for future research opportunities
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Annemarie Friedrich, Heike Adel, Federico Tomazic, Johannes Hingerl, Renou Benteau, Anika Maruscyk, Lukas Lange

Accepted for publication at ACL 2020
License: CC BY 4.0

Abstract: This paper presents a new challenging information extraction task in the domain of materials science. We develop an annotation scheme for marking information on experiments related to solid oxide fuel cells in scientific publications, such as involved materials and measurement conditions. With this paper, we publish our annotation guidelines, as well as our SOFC-Exp corpus consisting of 45 open-access scholarly articles annotated by domain experts. A corpus and an inter-annotator agreement study demonstrate the complexity of the suggested named entity recognition and slot filling tasks as well as high annotation quality. We also present strong neural-network based models for a variety of tasks that can be addressed on the basis of our new data set. On all tasks, using BERT embeddings leads to large performance gains, but with increasing task complexity, adding a recurrent neural network on top seems beneficial. Our models will serve as competitive baselines in future work, and analysis of their performance highlights difficult cases when modeling the data and suggests promising research directions.

Submitted to arXiv on 04 Jun. 2020

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2006.03039v1

, , , , This paper introduces a novel information extraction task in the field of materials science, focusing on experiments related to solid oxide fuel cells (SOFCs) in scientific publications. The authors develop an annotation scheme for marking information on materials and measurement conditions in scholarly articles, and release the SOFC-Exp corpus consisting of 45 annotated open-access papers. Through an inter-annotator agreement study, they demonstrate the complexity of named entity recognition and slot filling tasks, as well as the high quality of their annotations. The paper presents strong neural network models for various tasks based on the new data set, showing significant performance improvements when using BERT embeddings. Additionally, they find that adding a recurrent neural network can further enhance performance with increasing task complexity. These models are proposed as competitive baselines for future research in the field. Furthermore, the authors aim to promote research on challenging information extraction tasks by applying their findings to other experimental domains beyond SOFCs. They showcase the transferability of their best model configurations by achieving state-of-the-art results on a related corpus from a previous study. In summary, this paper contributes by developing an annotation scheme for marking experimental information in materials science publications and providing a publicly available corpus of 45 annotated papers on SOFC-related experiments. It also identifies sub-tasks for extracting experiment information and offers competitive neural network baselines. Furthermore, it demonstrates the applicability of their findings to another materials science corpus. The work is situated within the broader context of information extraction in scientific publications and materials science specifically. The authors reference related studies that focus on knowledge base construction and synthesis procedures in materials science texts, highlighting similarities and differences with their own approach. They also discuss advancements in neural entity tagging and slot filling models that inform their methodology. Overall, this paper sets a foundation for future research in information extraction tasks within materials science and other experimental domains, showcasing the potential impact of advanced neural network models in improving annotation quality and task performance.
Created on 18 Nov. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.