In the field of biology, microRNAs (miRNAs) are small non-coding RNAs that play a crucial role in post-transcriptional gene regulation. However, determining the sequence and structure of miRNAs through experimental methods is both expensive and time-consuming. To overcome these limitations, researchers have turned to computational and machine learning-based approaches for predicting novel miRNAs. With the integration of data science and machine learning in biology, numerous studies have been conducted to identify miRNAs using different computational methods and miRNA features. This systematic review focuses specifically on machine learning methods developed for identifying miRNAs in plants. By examining various approaches, including the learning algorithms employed, features considered, datasets used, and evaluation criteria applied, this review provides a comprehensive overview of past research efforts. The aim is to help researchers gain a detailed understanding of previous studies and identify new avenues for addressing the limitations encountered in those studies. The findings of this review emphasize the need for plant-specific computational methods for miRNA identification which can contribute to advancements in miRNA research in plants and pave the way for improved post-transcriptional gene regulation studies. Furthermore, this review highlights how such refined understanding can enable researchers to develop more accurate and efficient techniques tailored specifically to plant species.
- - MicroRNAs (miRNAs) are small non-coding RNAs that regulate genes in biology
- - Experimental methods for determining miRNA sequence and structure are expensive and time-consuming
- - Computational and machine learning-based approaches have been used to predict novel miRNAs
- - Numerous studies have focused on identifying miRNAs in plants using data science and machine learning
- - The review examines different approaches, learning algorithms, features, datasets, and evaluation criteria used in past research efforts
- - The aim is to help researchers understand previous studies and find new ways to address limitations encountered
- - Plant-specific computational methods for miRNA identification are needed for advancements in miRNA research in plants
- - Refined understanding can lead to more accurate and efficient techniques tailored to plant species.
MicroRNAs (miRNAs) are tiny molecules that control genes in living things. Scientists use expensive and time-consuming methods to study miRNA sequence and structure. They also use computer programs and machine learning to predict new miRNAs. Many studies have focused on finding miRNAs in plants using data science and machine learning. This review looks at different approaches, algorithms, features, datasets, and evaluation criteria used in past research efforts. The goal is to help researchers understand previous studies and find better ways to overcome challenges. Plant-specific computational methods are needed for studying miRNAs in plants. Having a better understanding can lead to more accurate and efficient techniques specifically designed for plant species."
Definitions- MicroRNAs (miRNAs): Small non-coding RNAs that regulate genes.
- Experimental methods: Techniques used in scientific experiments.
- Computational: Relating to computers or computer-based systems.
- Machine learning: A type of artificial intelligence where machines learn from data without being explicitly programmed.
- Predict: To make an educated guess about something before it happens.
- Novel: New or original.
- Data science: The study of extracting knowledge or insights from data.
- Evaluation criteria: Standards or measures used to assess something's quality or effectiveness.
- Advancements: Improvements or progress made in a particular field.
- Refined understanding: A deeper or more detailed comprehension of something.
Exploring Machine Learning Methods for miRNA Identification in Plants
MicroRNAs (miRNAs) are small non-coding RNAs that play a crucial role in post-transcriptional gene regulation. In the field of biology, they have been extensively studied to understand their role in various biological processes. However, determining the sequence and structure of miRNAs through experimental methods is both expensive and time-consuming. To overcome these limitations, researchers have turned to computational and machine learning-based approaches for predicting novel miRNAs. With the integration of data science and machine learning in biology, numerous studies have been conducted to identify miRNAs using different computational methods and miRNA features.
This systematic review focuses specifically on machine learning methods developed for identifying miRNAs in plants. By examining various approaches, including the learning algorithms employed, features considered, datasets used, and evaluation criteria applied, this review provides a comprehensive overview of past research efforts. The aim is to help researchers gain a detailed understanding of previous studies and identify new avenues for addressing the limitations encountered in those studies.
Learning Algorithms Employed
The majority of existing studies employ supervised machine learning algorithms such as support vector machines (SVMs), random forests (RFs), artificial neural networks (ANNs), k-nearest neighbor classifiers (KNNs), logistic regression models (LRs), decision trees (DTs) etc., for identifying plant miRNAs from genomic sequences or other related data sources such as expression profiles or secondary structures. For example, one study used an SVM model trained with nucleotide composition features extracted from plant genomic sequences to predict potential pre-miRNA hairpins with high accuracy [1]. Another study utilized ANNs combined with evolutionary information derived from multiple species’ genomes to accurately classify known plant microRNA precursors [2]. Similarly, several other studies have employed RFs [3], KNNs [4], LRs [5] etc., along with feature selection techniques like principal component analysis (PCA) or mutual information based feature selection algorithm (MIFS) for predicting novel plant microRNAs from genomic sequences or expression profiles [6].
Features Considered
In addition to nucleotide composition features which are commonly used by most existing approaches for predicting pre-miRNA hairpins from genomic sequences; some recent studies also consider secondary structure information derived from RNA folding algorithms such as Vienna RNA package or mfold web server; evolutionary conservation scores obtained using phylogenetic tree construction tools like PhyML; thermodynamic stability scores calculated using UNAFold software; sequence motif patterns identified by MEME suite; gene ontology annotations retrieved using Blast2GO etc., as additional features while training their predictive models on known plant microRNA precursors datasets. For instance, one study proposed an ensemble approach combining SVM models trained on different types of features including nucleotide composition based ones along with secondary structure related ones derived from RNA folding algorithms [7]. Similarly another study utilized PCA combined with MIFS algorithm followed by RF model trained on selected motif pattern based features extracted from known Arabidopsis thaliana microRNA precursor dataset to accurately predict novel A. thaliana microRNA precursors [8].
Datasets Used
Most existing studies utilize publicly available datasets containing experimentally verified known plant microRNA precursors collected either manually or through automated curation process performed over large scale sequencing experiments conducted across different species’ genomes e.g., Plant MicroRNAS Database(PMRD)[9], Plant Small Regulatory RNAdb(PSRdb)[10] etc.. Some recent works also use expression profile datasets generated through high throughput sequencing technologies like Illumina HiSeq 2000 platform[11]or Affymetrix GeneChip arrays[12]for training their predictive models on known plant microRNA precursors data sets .
Evaluation Criteria Applied
For evaluating performance of their proposed predictive models , most existing works employ standard metrics such as sensitivity , specificity , precision , recall , F1 score , Matthews correlation coefficient(MCC)etc.. Some recent works also utilize receiver operating characteristic(ROC) curves along with area under ROC curve(AUC ) metric for assessing accuracy achieved by their proposed systems .
Findings & Implications
The findings of this review emphasize the need for developing more accurate and efficient techniques tailored specifically towards plants which can contribute significantly towards advancements made in post transcriptional gene regulation research involving plants . Furthermore it highlights how refined understanding gained through this review can enable researchers to develop more effective computational methods specific only towards plants which could further pave way towards improved prediction capabilities when it comes to identifying novel plant miRNAs .