A New Data Representation Based on Training Data Characteristics to Extract Drug Named-Entity in Medical Text
AI-generated Key Points
⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.
- Drug name recognition is a crucial task in information extraction from medical texts
- Challenges in medical text include unstructured nature, rapid introduction of new terms, and wide variations in drug names
- Lack of labeled datasets, external knowledge sources, and multiple token representations for drug names exacerbate challenges
- Many existing approaches struggle to achieve satisfactory F-score performance levels (below 0.75)
- A new study introduces innovative data representation techniques to overcome these challenges
- Three distinct techniques proposed based on word distribution characteristics and word similarities derived from word embedding training:
- Evaluation with Multi-Layer Perceptrons (MLP)
- Utilization of Deep Belief Networks (DBN) and Stacked Denoising Encoders (SAE)
- Representation of sentences as sequences using Long Short Term Memory (LSTM) recurrent neural network model
- The third technique utilizing LSTM achieves the best F-score performance with an average of 0.8645
- This research provides valuable insights into enhancing drug name recognition in medical text mining applications and sets a new benchmark for performance in this domain
Authors: Sadikin Mujiono, Mohamad Ivan Fanany, Chan Basaruddin
Abstract: One essential task in information extraction from the medical corpus is drug name recognition. Compared with text sources come from other domains, the medical text is special and has unique characteristics. In addition, the medical text mining poses more challenges, e.g., more unstructured text, the fast growing of new terms addition, a wide range of name variation for the same drug. The mining is even more challenging due to the lack of labeled dataset sources and external knowledge, as well as multiple token representations for a single drug name that is more common in the real application setting. Although many approaches have been proposed to overwhelm the task, some problems remained with poor F-score performance (less than 0.75). This paper presents a new treatment in data representation techniques to overcome some of those challenges. We propose three data representation techniques based on the characteristics of word distribution and word similarities as a result of word embedding training. The first technique is evaluated with the standard NN model, i.e., MLP (Multi-Layer Perceptrons). The second technique involves two deep network classifiers, i.e., DBN (Deep Belief Networks), and SAE (Stacked Denoising Encoders). The third technique represents the sentence as a sequence that is evaluated with a recurrent NN model, i.e., LSTM (Long Short Term Memory). In extracting the drug name entities, the third technique gives the best F-score performance compared to the state of the art, with its average F-score being 0.8645.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.