A New Data Representation Based on Training Data Characteristics to Extract Drug Named-Entity in Medical Text

AI-generated keywords: Information Extraction Medical Texts Drug Name Recognition Data Representation Techniques Performance Improvement

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Drug name recognition is a crucial task in information extraction from medical texts
  • Challenges in medical text include unstructured nature, rapid introduction of new terms, and wide variations in drug names
  • Lack of labeled datasets, external knowledge sources, and multiple token representations for drug names exacerbate challenges
  • Many existing approaches struggle to achieve satisfactory F-score performance levels (below 0.75)
  • A new study introduces innovative data representation techniques to overcome these challenges
  • Three distinct techniques proposed based on word distribution characteristics and word similarities derived from word embedding training:
  • Evaluation with Multi-Layer Perceptrons (MLP)
  • Utilization of Deep Belief Networks (DBN) and Stacked Denoising Encoders (SAE)
  • Representation of sentences as sequences using Long Short Term Memory (LSTM) recurrent neural network model
  • The third technique utilizing LSTM achieves the best F-score performance with an average of 0.8645
  • This research provides valuable insights into enhancing drug name recognition in medical text mining applications and sets a new benchmark for performance in this domain
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Sadikin Mujiono, Mohamad Ivan Fanany, Chan Basaruddin

Computational Intelligence and Neuroscience Volume 2016 (2016), Article ID 3483528, 24 pages
Hindawi Publishing. Computational Intelligence and Neuroscience Volume 2016 (2016), Article ID 3483528, 24 pages Received 27 May 2016; Revised 8 August 2016; Accepted 18 September 2016. Special Issue on "Smart Data: Where the Big Data Meets the Semantics". Academic Editor: Trong H. Duong

Abstract: One essential task in information extraction from the medical corpus is drug name recognition. Compared with text sources come from other domains, the medical text is special and has unique characteristics. In addition, the medical text mining poses more challenges, e.g., more unstructured text, the fast growing of new terms addition, a wide range of name variation for the same drug. The mining is even more challenging due to the lack of labeled dataset sources and external knowledge, as well as multiple token representations for a single drug name that is more common in the real application setting. Although many approaches have been proposed to overwhelm the task, some problems remained with poor F-score performance (less than 0.75). This paper presents a new treatment in data representation techniques to overcome some of those challenges. We propose three data representation techniques based on the characteristics of word distribution and word similarities as a result of word embedding training. The first technique is evaluated with the standard NN model, i.e., MLP (Multi-Layer Perceptrons). The second technique involves two deep network classifiers, i.e., DBN (Deep Belief Networks), and SAE (Stacked Denoising Encoders). The third technique represents the sentence as a sequence that is evaluated with a recurrent NN model, i.e., LSTM (Long Short Term Memory). In extracting the drug name entities, the third technique gives the best F-score performance compared to the state of the art, with its average F-score being 0.8645.

Submitted to arXiv on 06 Oct. 2016

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 1610.01891v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In the field of information extraction from medical texts, one crucial task is drug name recognition. Medical text presents unique challenges compared to other domains due to its unstructured nature and rapid introduction of new terms. Additionally, a wide range of variations in drug names further complicates this task. These challenges are exacerbated by the lack of labeled datasets and external knowledge sources, as well as the presence of multiple token representations for a single drug name. While various approaches have been proposed to address these obstacles, many still struggle to achieve satisfactory F-score performance levels (below 0.75). To tackle this issue, a new study introduces innovative data representation techniques aimed at overcoming these challenges. Three distinct techniques are proposed based on word distribution characteristics and word similarities derived from word embedding training. The first technique involves evaluating data representation with a standard neural network model known as Multi-Layer Perceptrons (MLP). The second technique utilizes two deep network classifiers - Deep Belief Networks (DBN) and Stacked Denoising Encoders (SAE). Lastly, the third technique represents sentences as sequences and is evaluated using a recurrent neural network model called Long Short Term Memory (LSTM). Among these techniques, the third approach stands out by achieving the best F-score performance compared to existing methods. With an average F-score of 0.8645, this technique demonstrates significant improvement in extracting drug name entities from medical texts. This research contributes valuable insights into enhancing drug name recognition in medical text mining applications and sets a new benchmark for performance in this challenging domain.
Created on 06 Mar. 2025

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.