Unlocking the power of machine learning for small molecule drug discovery

Posted: 30 June 2020 | Victoria Rees (Drug Target Review) | No comments yet

Rick Wagner of ZebiAI and Patrick Riley of Google Accelerated Science (GAS) discuss the development and benefits of a new machine learning drug discovery platform.

A collaborative study between ZebiAI, Google Accelerated Science (GAS) and X-Chem has used the power of machine learning to improve the drug discovery process.

The paper, published in the Journal of Medicinal Chemistry, describes an effective machine learning platform with the ability to accelerate drug discovery based on DNA-encoded small molecule library (DEL) selection data. According to the researchers, their findings demonstrate the efficacy of the programme to predict highly potent small molecule inhibitors within a virtual library of compounds across three diverse protein targets.

“We envision artificial intelligence (AI) and machine learning will be a leading source of novel, small molecule drug candidates. These technologies will become indispensable as a means for leveraging large datasets to understand disease biology and identify the best candidates to address intractable diseases,” said Founder and Director of ZebiAI, Rick Wagner, when speaking to Drug Target Review.

Drug Target Review has just announced the launch of its NEW and EXCLUSIVE report examining the evolution of AI and informatics in drug discovery and development.

In this 63 page in-depth report, experts and researchers explore the key benefits of AI and informatics processes, reveal where the challenges lie for the implementation of AI and how they see the use of these technologies streamlining workflows in the future.

Also featured are exclusive interviews with leading scientists from AstraZeneca, Auransa, PolarisQB and Chalmers University of Technology.

FREE DOWNLOAD HERE

How the platform works

According to the researchers, every small molecule in the library has a unique DNA barcode attached to it, allowing the molecules to be easily catalogued. The library is then used to find which small molecules bind to proteins of interest, by mixing the DEL molecules and proteins. DNA sequencing methods are subsequently used to determine the DNA barcode of the molecules that are bound to the protein target, therefore identifying the compounds.

Data on the thousands of molecules that bind to a protein target in a DEL screen provide a chemical imprint of the target. This makes it possible to derive a machine learning model that can predict active compounds from virtual libraries to the protein of interest, opening up unlimited chemical space.

The researchers highlight that currently, there are not enough small molecule probes available for drug discovery, with only an estimated four percent of the human proteome having a usable probe. Most screening methods are limited by the scope of chemical space to which they provide access. However, DELs combined with machine learning present a new solution.

Therefore, broader and deeper study of the biology of intractable diseases using this approach will accelerate the discovery of novel therapeutics, ultimately improving human health.

The new paper also details the identification of active compounds outside of the DEL library which are structurally different from the molecules used in training. The researchers say these results indicate that, at least for certain targets, machine learning applied to DEL data enables access to unlimited chemical space in a time- and cost-effective manner.

The benefits of the model

Speaking to Drug Target Review, Principal Software Engineer at Google, Patrick Riley, said: “What we have shown is that the combination of physical screening data from a high quality DEL allows you to build a surprisingly effective virtual screening model. This allows a much more cost-effective way to search through chemical space. When combined with ever increasing low costs and on demand chemical libraries, you have a much cheaper way to find hits across a larger chemical space. Those hits are great starting points for chemical probes or further drug discovery efforts.”

“Our machine learning approach allows for the discovery of complex patterns that would be difficult to impossible for a scientist to detect by direct examination of hundreds of millions of data points derived from DEL selection data. By generating models of molecules that bind targets of interest, our technology can extrapolate data significantly beyond the chemistry in the DEL and provide insight into molecules that have specific properties, are easily synthesised or are procured at little expense,” added Wagner.

The Chemome Initiative

With this process established, ZebiAI and GAS have formed the ‘Chemome Initiative’ programme, allowing them to apply their platform.

According to the companies, they will develop chemical probe molecules for the academic community across thousands of novel targets, driving deeper understanding of the biology of intractable diseases.

“The Chemome Initiative will transform our understanding of biology by rapidly providing chemical probes to academic researchers exploring new targets. Access to high quality probes will allow scientists to test new hypotheses in the biological system of their choice. We expect that these initial results will, in some cases, lead to therapeutic hypotheses that will drive new drug discovery programmes,” said Wagner.

Discovering and designing drugs with artificial intelligence…

“We are excited about this work not just because of the technical interest, but because of the Chemome Initiative. Working with ZebiAI to use this technology for chemical probes means that we can have a broad impact on the early biological research process and we are looking forward to seeing that through,” commented Riley.

However, the researchers highlight that there is the challenge of expansion to ensure the broadest scope of target proteins. They say the key issue will be to establish biological test systems across, ultimately, thousands of protein targets.

“We continue to reach out to new academic institutions and foundations to build our network to access a broadening range of protein targets, assay capabilities and expertise. We have learned a great deal from our partnership with the Structural Genomics Consortium and continue to refine our approach to projects and enhance the technology to drive the impact of the Chemome efforts,” commented Wagner.

Conclusion

“Many in the field are talking about the use of AI in drug discovery and we think this is a great trend. It is important that we focus on high quality evaluations of new AI approaches in drug discovery like we did in this work. As the community utilises this kind of evaluation and learns where AI makes a meaningful difference, I believe we will see further adoption of AI in the right places,” said Riley.

“We see the expanded value of these models in providing predictions for hit-to-lead and lead optimisation. We believe these predictive models in combination with traditional medicinal chemistry and computational approaches will accelerate the drug discovery process to find safer, more effective therapeutics,” summarised Wagner.

Related organisations
Google Accelerated Science (GAS), X-Chem, ZebiAI

Related people
Patrick Riley, Rick Wagner

Cookie	Type	Duration	Description
cookielawinfo-checkbox-advertising-targeting	persistent	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	persistent	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	session	1 year	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	persistent	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	session	1 year	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Type	Duration	Description
advanced_ads_browser_width	persistent	1 month	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	persistent	2 years	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	persistent	1 month	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	persistent	1 year	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	persistent	2 years	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	persistent	2 years	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	persistent	3 months	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	persistent	1 month	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	persistent	5 months	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Type	Duration	Description
bcookie	persistent	2 years	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	persistent	30 minutes	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	session	1 year	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	persistent	1 day	This cookie is set by LinkedIn and used for routing.
lissc	persistent	11 months	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	persistent	2 years	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	persistent	2 years	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	persistent	20 minutes	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	persistent	20 minutes	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	persistent	20 minutes	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	persistent	2 years	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	persistent	1 minute	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	persistent	1 day	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Type	Duration	Description
cf_ob_info	persistent	1 minute	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	persistent	1 minute	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	session	1 year	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	persistent	1 month	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	persistent	Until cleared	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	session	1 year	This cookie is set by Youtube and is used to track the views of embedded videos.