Machine Learning Methods for Cancer Classification Using Gene Expression Data: A Review

AI-generated keywords: Cancer Gene expression analysis Machine learning Deep learning RNA-Seq

AI-generated Key Points

  • Cancer is a complex group of diseases characterized by abnormal cell growth that can spread to different parts of the body, making it the second leading cause of death globally.
  • Gene expression analysis is crucial for early cancer detection by providing insights into the biochemical processes and genetic characteristics of cells and tissues.
  • DNA microarrays and RNA-sequencing methods are essential tools for quantifying gene expression levels and generating valuable data for computational analysis.
  • Recent advancements in gene expression analysis for cancer classification involve machine learning techniques, with a focus on deep learning models due to their ability to identify unique gene patterns associated with various types of cancers.
  • Popular deep neural network architectures such as multi-layer perceptrons, convolutional networks, recurrent networks, graph networks, and transformer networks are utilized in this study.
  • Data collection methods for gene expression analysis and key datasets commonly used for supervised machine learning in this domain are discussed.
  • Important techniques for feature engineering and data preprocessing to address the high dimensionality of gene expression data are highlighted.
  • The study provides novel insights on graph neural networks (GNN) and transformer neural networks (TNN) for gene expression analysis not covered in prior works.
  • The review contributes valuable insights into both conventional machine learning methods and recent deep learning approaches for gene expression analysis in cancer classification.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: Fadi Alharbi, Aleksandar Vakanski

Bioengineering 2023, 10(2), 173
29 pages, 1 figure, 11 tables
License: CC BY 4.0

Abstract: Cancer is a term that denotes a group of diseases caused by abnormal growth of cells that can spread in different parts of the body. According to the World Health Organization (WHO), cancer is the second major cause of death after cardiovascular diseases. Gene expression can play a fundamental role in the early detection of cancer, as it is indicative of the biochemical processes in tissue and cells, as well as the genetic characteristics of an organism. Deoxyribonucleic Acid (DNA) microarrays and Ribonucleic Acid (RNA)- sequencing methods for gene expression data allow quantifying the expression levels of genes and produce valuable data for computational analysis. This study reviews recent progress in gene expression analysis for cancer classification using machine learning methods. Both conventional and deep learning-based approaches are reviewed, with an emphasis on the ap-plication of deep learning models due to their comparative advantages for identifying gene patterns that are distinctive for various types of cancers. Relevant works that employ the most commonly used deep neural network architectures are covered, including multi-layer perceptrons, convolutional, recurrent, graph, and transformer networks. This survey also presents an overview of the data collection methods for gene expression analysis and lists important datasets that are commonly used for supervised machine learning for this task. Furthermore, reviewed are pertinent techniques for feature engineering and data preprocessing that are typically used to handle the high dimensionality of gene expression data, caused by a large number of genes present in data samples. The paper concludes with a discussion of future research directions for machine learning-based gene expression analysis for cancer classification.

Submitted to arXiv on 28 Jan. 2023

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2301.12222v1

Cancer is a complex group of diseases characterized by abnormal cell growth that can spread to different parts of the body, making it the second leading cause of death globally. Gene expression analysis plays a crucial role in early cancer detection by providing insights into the biochemical processes and genetic characteristics of cells and tissues. Deoxyribonucleic Acid (DNA) microarrays and Ribonucleic Acid (RNA)-sequencing methods are essential tools for quantifying gene expression levels and generating valuable data for computational analysis. This study explores recent advancements in gene expression analysis for cancer classification using machine learning techniques. Both traditional and deep learning-based approaches are reviewed, with a focus on the application of deep learning models due to their ability to identify unique gene patterns associated with various types of cancers. The survey covers relevant works utilizing popular deep neural network architectures such as multi-layer perceptrons, convolutional networks, recurrent networks, graph networks, and transformer networks. Additionally, the paper provides an overview of data collection methods for gene expression analysis and highlights key datasets commonly used for supervised machine learning in this domain. It also discusses important techniques for feature engineering and data preprocessing to address the high dimensionality of gene expression data resulting from a large number of genes present in samples. Furthermore, the authors list previous review papers related to gene expression analysis, offering comparative information on conventional machine learning approaches, feature engineering techniques, deep learning approaches like recurrent neural networks (RNN) and convolutional neural networks (CNN), as well as the use of microarray versus RNA-Seq data. The study distinguishes itself by providing novel insights not covered in prior works, including a comprehensive discussion on graph neural networks (GNN) and transformer neural networks (TNN) for gene expression analysis. Overall, this comprehensive review contributes valuable insights into both conventional machine learning methods and recent deep learning approaches for gene expression analysis in cancer classification. It also sheds light on modeling RNA-Seq data formats which have become dominant in recent years while discussing related feature engineering techniques not extensively covered in existing literature. The paper concludes with suggestions for future research directions in machine learning-based gene expression analysis for cancer classification.
Created on 03 Sep. 2024

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.