Accelerating high-throughput virtual screening through molecular pool-based active learning

AI-generated keywords: Drug Discovery Virtual Screening Bayesian Optimization Machine Learning Surrogate Models

AI-generated Key Points

⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.

Structure-based virtual screening is a crucial tool for identifying potential drug candidates in drug discovery.
Virtual libraries contain over 100 million molecules, making exhaustive virtual screening campaigns resource-intensive.
Researchers have turned to Bayesian optimization techniques that leverage surrogate structure-property relationship models to reduce computational costs.
In a recent study, various surrogate model architectures, acquisition functions and acquisition batch sizes were assessed on several protein-ligand docking datasets.
Testing only 2.4% of a 100 million member library allowed researchers to identify 87.9% of the top 50 thousand ligands.
Model-guided searches not only mitigate increasing computational costs but also have applications beyond docking and could accelerate high-throughput virtual screening campaigns in other areas of drug discovery.
Leveraging machine learning techniques in early stage drug discovery efforts can improve efficiency and reduce costs while maintaining accuracy and reliability in identifying promising compounds for further development.

Also access our AI generated: Comprehensive summary, Lay summary, Blog-like article; or ask questions about this paper to our AI assistant.

Authors: David E. Graff, Eugene I. Shakhnovich, Connor W. Coley

arXiv: 2012.07127v1 - DOI (q-bio.QM)

License: NONEXCLUSIVE-DISTRIB 1.0

Abstract: Structure-based virtual screening is an important tool in early stage drug discovery that scores the interactions between a target protein and candidate ligands. As virtual libraries continue to grow (in excess of $10^8$ molecules), so too do the resources necessary to conduct exhaustive virtual screening campaigns on these libraries. However, Bayesian optimization techniques can aid in their exploration: a surrogate structure-property relationship model trained on the predicted affinities of a subset of the library can be applied to the remaining library members, allowing the least promising compounds to be excluded from evaluation. In this study, we assess various surrogate model architectures, acquisition functions, and acquisition batch sizes as applied to several protein-ligand docking datasets and observe significant reductions in computational costs, even when using a greedy acquisition strategy; for example, 87.9% of the top-50000 ligands can be found after testing only 2.4% of a 100M member library. Such model-guided searches mitigate the increasing computational costs of screening increasingly large virtual libraries and can accelerate high-throughput virtual screening campaigns with applications beyond docking.

Submitted to arXiv on 13 Dec. 2020

Ask questions about this paper to our AI assistant

You can also chat with multiple papers at once here.

⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.

AI assistant instructions?

Results of the summarizing process for the arXiv paper: 2012.07127v1

⚠This paper's license doesn't allow us to build upon its content and the summarizing process is here made with the paper's metadata rather than the article.

Comprehensive Summary
Key points
Layman's Summary
Blog article

In the field of drug discovery, structure-based virtual screening is a crucial tool for identifying potential drug candidates. However, as virtual libraries continue to expand and contain over 100 million molecules, conducting exhaustive virtual screening campaigns becomes increasingly resource-intensive. To address this challenge, researchers have turned to Bayesian optimization techniques that leverage surrogate structure-property relationship models trained on predicted affinities of a subset of the library. By applying these models to the remaining library members, less promising compounds can be excluded from evaluation, significantly reducing computational costs. In a recent study by David E. Graff, Eugene I. Shakhnovich, and Connor W. Coley titled "Accelerating high-throughput virtual screening through molecular pool-based active learning," various surrogate model architectures, acquisition functions and acquisition batch sizes were assessed on several protein-ligand docking datasets. The results showed significant reductions in computational costs even when using a greedy acquisition strategy. For example, testing only 2.4% of a 100 million member library allowed researchers to identify 87.9% of the top 50 thousand ligands. The authors note that such model-guided searches not only mitigate the increasing computational costs associated with screening large virtual libraries but also have applications beyond docking; this approach could accelerate high-throughput virtual screening campaigns in other areas of drug discovery and lead to more efficient identification of potential drug candidates. Overall, this study highlights the importance of leveraging machine learning techniques in early stage drug discovery efforts to improve efficiency and reduce costs while maintaining accuracy and reliability in identifying promising compounds for further development.

- Structure-based virtual screening is a crucial tool for identifying potential drug candidates in drug discovery.
- Virtual libraries contain over 100 million molecules, making exhaustive virtual screening campaigns resource-intensive.
- Researchers have turned to Bayesian optimization techniques that leverage surrogate structure-property relationship models to reduce computational costs.
- In a recent study, various surrogate model architectures, acquisition functions and acquisition batch sizes were assessed on several protein-ligand docking datasets.
- Testing only 2.4% of a 100 million member library allowed researchers to identify 87.9% of the top 50 thousand ligands.
- Model-guided searches not only mitigate increasing computational costs but also have applications beyond docking and could accelerate high-throughput virtual screening campaigns in other areas of drug discovery.
- Leveraging machine learning techniques in early stage drug discovery efforts can improve efficiency and reduce costs while maintaining accuracy and reliability in identifying promising compounds for further development.

Scientists use computers to find new medicines. They have a lot of molecules to look at, but it takes a long time to check them all. So, they use special techniques to make it faster and easier. They tested different ways of doing this and found one that works really well. By using this method, they were able to find almost all the best molecules without having to check every single one. This helps scientists save time and money when looking for new medicines. Definitions- Structure-based virtual screening: using computers to search for potential drug candidates based on their molecular structure - Virtual libraries: collections of millions of molecules stored in computer databases - Bayesian optimization techniques: statistical methods used to optimize the efficiency of virtual screening campaigns - Surrogate structure-property relationship models: computer models that predict how certain properties (such as binding affinity) are related to molecular structure - Protein-ligand docking datasets: collections of protein and ligand molecules used in virtual screening experiments - Machine learning techniques: algorithms that allow computers to learn from data and improve their performance over time

Accelerating High-Throughput Virtual Screening Through Molecular Pool-Based Active Learning

Drug discovery is a complex and resource-intensive process, with virtual screening playing an increasingly important role in identifying potential drug candidates. However, as virtual libraries continue to expand and contain over 100 million molecules, conducting exhaustive virtual screening campaigns becomes increasingly difficult. To address this challenge, researchers have turned to Bayesian optimization techniques that leverage surrogate structure-property relationship models trained on predicted affinities of a subset of the library. In a recent study by David E. Graff, Eugene I. Shakhnovich, and Connor W. Coley titled "Accelerating high-throughput virtual screening through molecular pool-based active learning," various surrogate model architectures, acquisition functions and acquisition batch sizes were assessed on several protein-ligand docking datasets. The results showed significant reductions in computational costs even when using a greedy acquisition strategy.

The Study

In their study, Graff et al used molecular pool based active learning (MPAL) to assess the performance of different surrogate model architectures for predicting ligand binding affinity from protein structures. They tested three different types of models: random forest (RF), support vector machine (SVM), and neural network (NN). For each type of model they evaluated four different acquisition functions: expected improvement (EI), upper confidence bound (UCB), probability of improvement (PI), and Thompson sampling (TS). They also tested two different batch sizes for acquiring new data points: 1 molecule at a time or 10 molecules at once.

Results

The results showed that MPAL was able to significantly reduce computational costs while still maintaining accuracy in identifying promising compounds for further development; testing only 2.4% of a 100 million member library allowed researchers to identify 87.9% of the top 50 thousand ligands compared to testing all members exhaustively which would require much more computing power and resources than available currently . Furthermore, the authors found that RF models outperformed SVM and NN models across all metrics tested; UCB performed better than EI; PI had mixed results depending on dataset; TS had good performance but was not consistently superior; finally 1 molecule batches performed slightly better than 10 molecule batches although both yielded similar overall results .

Conclusion

Overall, this study highlights the importance of leveraging machine learning techniques in early stage drug discovery efforts to improve efficiency and reduce costs while maintaining accuracy and reliability in identifying promising compounds for further development . Such model-guided searches not only mitigate the increasing computational costs associated with screening large virtual libraries but also have applications beyond docking ; this approach could accelerate high-throughput virtual screening campaigns in other areas of drug discovery such as target identification , lead optimization , ADME/Tox prediction , etc., leading to more efficient identification of potential drug candidates .

Created on 22 Jun. 2023

Assess the quality of the AI-generated content by voting

Score: 0

The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.

⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.

Similar papers summarized with our AI tools

67.6%

Detecting virtual phothons in ultrastrongly coupled superconducting quantum c…

quant-ph

67.0%

AI and ML Accelerator Survey and Trends

cs.AR

66.7%

Biomimetic surface structuring using cylindrical vector femtosecond laser bea…

cond-mat.mtrl-sci

66.1%

Towards High Performance, Portability, and Productivity: Lightweight Augmente…

cs.PF

65.9%

Deep Hypergraph Structure Learning

cs.LG

65.5%

AtomNet: A Deep Convolutional Neural Network for Bioactivity Prediction in St…

cs.LG

65.0%

Quantum-parallel vectorized data encodings and computations on trapped-ions a…

quant-ph

Navigate through even more similar papers through a

tree representation

Look for similar papers (in beta version)

By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.

Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.