news

Bioactivity information of one million molecules compiled

Using deep machine learning, researchers have completed the activity profiles, from chemistry to clinical level, for one million molecules.

Informatics for bioactivity

Researchers from the Institute for Research in Biomedicine (IRB Barcelona), Spain, have completed the collection of bioactivity information for a million molecules using deep machine-learning computational models. The team also developed a tool to predict the biological activity of any molecule, even when no experimental data are available.

This new methodology is based on the Chemical Checker, the largest database of bioactivity profiles for pseudo pharmaceuticals to date, developed by the same laboratory and published in 2020. The Chemical Checker collects information from 25 spaces of bioactivity for each molecule. These spaces are linked to the chemical structure of the molecule, the targets with which it interacts or the changes it induces at the clinical or cellular level. However, this highly detailed information about the mechanism of action is incomplete for most molecules, implying that for a particular one there may be information for one or two spaces of bioactivity but not for all 25.

With this new development, the researchers integrated all the experimental information available with deep machine learning methods, so that all the activity profiles, from chemistry to clinical level, for all molecules can be completed.

“The new tool also allows us to forecast the bioactivity spaces of new molecules, and this is crucial in the drug discovery process as we can select the most suitable candidates and discard those that, for one reason or another, would not work,” said Dr Patrick Aloy, who led the research team. 

The software library is freely accessible to the scientific community here and it will be regularly updated by the researchers as more biological activity data become available. With each update of experimental data in the Chemical Checker, artificial neural networks will also be revised to refine the estimates.

According to the team, the bioactivity data predicted by the model have a greater or lesser degree of reliability depending on various factors, including the volume of experimental data available and the characteristics of the molecule. However, the system provides a measure of the degree of reliability of the prediction for each molecule.

To validate the tool, the researchers searched a library of compounds for those that could be good drug candidates to modulate the activity of a cancer-related transcription factor (SNAIL1), whose activity is almost impossible to modulate as it is considered an ‘undruggable’ target. Of a first set of 17,000 compounds, deep machine learning models predicted characteristics (in their dynamics, interaction with target cells and proteins) for 131 that fit the target.

The ability of these compounds to degrade SNAIL1 has been confirmed experimentally and it has been observed that, for a high percentage, this degradation capacity is consistent with what the models had predicted, validating the system.

The study is published in Nature Communications