Accelerating high-throughput virtual screening through molecular pool-based active learning

AI-generated keywords: Drug Discovery Virtual Screening Bayesian Optimization Machine Learning Surrogate Models

AI-generated Key Points

The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

  • Structure-based virtual screening is a crucial tool for identifying potential drug candidates in drug discovery.
  • Virtual libraries contain over 100 million molecules, making exhaustive virtual screening campaigns resource-intensive.
  • Researchers have turned to Bayesian optimization techniques that leverage surrogate structure-property relationship models to reduce computational costs.
  • In a recent study, various surrogate model architectures, acquisition functions and acquisition batch sizes were assessed on several protein-ligand docking datasets.
  • Testing only 2.4% of a 100 million member library allowed researchers to identify 87.9% of the top 50 thousand ligands.
  • Model-guided searches not only mitigate increasing computational costs but also have applications beyond docking and could accelerate high-throughput virtual screening campaigns in other areas of drug discovery.
  • Leveraging machine learning techniques in early stage drug discovery efforts can improve efficiency and reduce costs while maintaining accuracy and reliability in identifying promising compounds for further development.
Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: David E. Graff, Eugene I. Shakhnovich, Connor W. Coley

arXiv: 2012.07127v1 - DOI (q-bio.QM)

Abstract: Structure-based virtual screening is an important tool in early stage drug discovery that scores the interactions between a target protein and candidate ligands. As virtual libraries continue to grow (in excess of $10^8$ molecules), so too do the resources necessary to conduct exhaustive virtual screening campaigns on these libraries. However, Bayesian optimization techniques can aid in their exploration: a surrogate structure-property relationship model trained on the predicted affinities of a subset of the library can be applied to the remaining library members, allowing the least promising compounds to be excluded from evaluation. In this study, we assess various surrogate model architectures, acquisition functions, and acquisition batch sizes as applied to several protein-ligand docking datasets and observe significant reductions in computational costs, even when using a greedy acquisition strategy; for example, 87.9% of the top-50000 ligands can be found after testing only 2.4% of a 100M member library. Such model-guided searches mitigate the increasing computational costs of screening increasingly large virtual libraries and can accelerate high-throughput virtual screening campaigns with applications beyond docking.

Submitted to arXiv on 13 Dec. 2020

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2012.07127v1

This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

In the field of drug discovery, structure-based virtual screening is a crucial tool for identifying potential drug candidates. However, as virtual libraries continue to expand and contain over 100 million molecules, conducting exhaustive virtual screening campaigns becomes increasingly resource-intensive. To address this challenge, researchers have turned to Bayesian optimization techniques that leverage surrogate structure-property relationship models trained on predicted affinities of a subset of the library. By applying these models to the remaining library members, less promising compounds can be excluded from evaluation, significantly reducing computational costs. In a recent study by David E. Graff, Eugene I. Shakhnovich, and Connor W. Coley titled "Accelerating high-throughput virtual screening through molecular pool-based active learning," various surrogate model architectures, acquisition functions and acquisition batch sizes were assessed on several protein-ligand docking datasets. The results showed significant reductions in computational costs even when using a greedy acquisition strategy. For example, testing only 2.4% of a 100 million member library allowed researchers to identify 87.9% of the top 50 thousand ligands. The authors note that such model-guided searches not only mitigate the increasing computational costs associated with screening large virtual libraries but also have applications beyond docking; this approach could accelerate high-throughput virtual screening campaigns in other areas of drug discovery and lead to more efficient identification of potential drug candidates. Overall, this study highlights the importance of leveraging machine learning techniques in early stage drug discovery efforts to improve efficiency and reduce costs while maintaining accuracy and reliability in identifying promising compounds for further development.
Created on 22 Jun. 2023

Assess the quality of the AI-generated content by voting

Score: 0

Why do we need votes?

Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.