Accelerating high-throughput virtual screening through molecular pool-based active learning
AI-generated Key Points
⚠The license of the paper does not allow us to build upon its content and the key points are generated using the paper metadata rather than the full article.
- Structure-based virtual screening is a crucial tool for identifying potential drug candidates in drug discovery.
- Virtual libraries contain over 100 million molecules, making exhaustive virtual screening campaigns resource-intensive.
- Researchers have turned to Bayesian optimization techniques that leverage surrogate structure-property relationship models to reduce computational costs.
- In a recent study, various surrogate model architectures, acquisition functions and acquisition batch sizes were assessed on several protein-ligand docking datasets.
- Testing only 2.4% of a 100 million member library allowed researchers to identify 87.9% of the top 50 thousand ligands.
- Model-guided searches not only mitigate increasing computational costs but also have applications beyond docking and could accelerate high-throughput virtual screening campaigns in other areas of drug discovery.
- Leveraging machine learning techniques in early stage drug discovery efforts can improve efficiency and reduce costs while maintaining accuracy and reliability in identifying promising compounds for further development.
Authors: David E. Graff, Eugene I. Shakhnovich, Connor W. Coley
Abstract: Structure-based virtual screening is an important tool in early stage drug discovery that scores the interactions between a target protein and candidate ligands. As virtual libraries continue to grow (in excess of $10^8$ molecules), so too do the resources necessary to conduct exhaustive virtual screening campaigns on these libraries. However, Bayesian optimization techniques can aid in their exploration: a surrogate structure-property relationship model trained on the predicted affinities of a subset of the library can be applied to the remaining library members, allowing the least promising compounds to be excluded from evaluation. In this study, we assess various surrogate model architectures, acquisition functions, and acquisition batch sizes as applied to several protein-ligand docking datasets and observe significant reductions in computational costs, even when using a greedy acquisition strategy; for example, 87.9% of the top-50000 ligands can be found after testing only 2.4% of a 100M member library. Such model-guided searches mitigate the increasing computational costs of screening increasingly large virtual libraries and can accelerate high-throughput virtual screening campaigns with applications beyond docking.
Ask questions about this paper to our AI assistant
You can also chat with multiple papers at once here.
⚠The license of the paper does not allow us to build upon its content and the AI assistant only knows about the paper metadata rather than the full article.
Assess the quality of the AI-generated content by voting
Score: 0
Why do we need votes?
Votes are used to determine whether we need to re-run our summarizing tools. If the count reaches -10, our tools can be restarted.
The previous summary was created more than a year ago and can be re-run (if necessary) by clicking on the Run button below.
⚠The license of this specific paper does not allow us to build upon its content and the summarizing tools will be run using the paper metadata rather than the full article. However, it still does a good job, and you can also try our tools on papers with more open licenses.
Similar papers summarized with our AI tools
Navigate through even more similar papers through a
tree representationLook for similar papers (in beta version)
By clicking on the button above, our algorithm will scan all papers in our database to find the closest based on the contents of the full papers and not just on metadata. Please note that it only works for papers that we have generated summaries for and you can rerun it from time to time to get a more accurate result while our database grows.
Disclaimer: The AI-based summarization tool and virtual assistant provided on this website may not always provide accurate and complete summaries or responses. We encourage you to carefully review and evaluate the generated content to ensure its quality and relevance to your needs.