RNA and machine learning: rational design for multidimensional biomarkers

Glasscock, Jarret

RNA and machine learning: rational design for multidimensional biomarkers

37

SHARES

Share via

Posted: 9 January 2020 | Dr Jarret Glasscock (Cofactor Genomics) | No comments yet

Modern day oncology therapies have seen significant innovation in the last decade. It is high time we commit to using biomarkers that are driven by rational design and the latest computational methods.

In an earlier age of medicine, new therapies were often discovered by ‘accident’. There was little technical knowledge of structure or function to guide the process of developing curative treatments. Trial and error dictated progress, resulting in slow and unpredictable successes. As our knowledge of small molecules, proteins and their structural relationships grew, we entered the era of rational drug design. Rational drug design has made a significant impact in the field of oncology, where we have gathered a deep knowledge of ligand binding and biochemical pathways. Modern day drug strategies utilise frameworks of rational drug design, driven by computational experimentation to further the pace of potential therapy identification.

In the early 2000s, for example, there was an unexpected race for a small molecule blocker for type I TGF receptor (TGFβ) kinase. Two groups, one led by Scott Sawyer, Eli Lilly, and the other by Juswinder Singh, Biogen-Idec, discovered an identical molecule via separate efforts.^1,2 The Lilly team used conventional high-throughput screening (HTS) enzyme and cell assays, which were costly and time consuming. Independently, Singh’s team streamlined the discovery by employing computational methods to perform a ‘virtual screening’. This approach was faster, relatively less costly and enabled Biogen-Idec to garner an edge over Lilly. It was an early demonstration that computationally-guided design had the potential to prioritise or even replace expensive chemical and biological assays, minimising limitations and time to market. Since this time, databases of results from both low- and high-throughput studies have continued to explode, further enhancing our ability to rationally develop not only monotherapies, but also bispecific therapies and combination therapies.

The case for predictive biomarkers

…to achieve the goal of precision medicine, we need continued investment in rational biomarker design”

The evolution of biomarker design is not so different from the evolution of drug design. Even with the most efficacious therapies, not all patients respond. Furthermore, when the process of matching patients with certain therapies goes wrong, adverse events can be costly and even deadly. For some time, the industry has worked to find biomarkers that provide predictive insight for matching patients to the right treatments. Historically, this meant identifying specific patient populations that should receive, or not receive, a therapy.

Early on, macroscale pathological characteristics were used to make treatment decisions for patients, including for cancer. Tumour grade, size and location were documented and statistics from the clinical results of many patients were used to make these generalisations of who should receive therapy and who should not. Histology, once available, provided additional insight, taking us one step closer to a molecular-level understanding of why certain patients respond and others do not. However, the world of medicine changed drastically with the completion of the human genome project and the advent of genomic medicine.

The era of genomic medicine

The outcome of the human genome project was not a static reference sequence, as is often cited. Rather, the advancements made during the milestone effort and shortly after its completion resulted in the birth of genomic medicine. Genomic medicine represents a major breakthrough and significant driver towards what we know as precision medicine, often defined as the right patient receiving the right treatment at the right time. Since the completion of the human genome project, the technology known as high-throughput sequencing or next-generation sequencing (NGS) has generated trillions of genomic sequences from cancer patient’s tumour tissue.

Unfortunately, early attempts at using this data for rational biomarker design were not as effective as hoped. The field has relied heavily on DNA data. Linking observations in DNA to downstream biological implications of epigenetics, along with transcriptional and translational modifications has caused challenges. Enormous datasets have been mined to identify both drug targets and biomarkers in DNA, but the utility of single, static mutations has fallen short. There are a number of mutations where we understand the biological implications, such as BRAF V600E, but the presence of these mutations is not as accurate as we would hope in predicting response in the context of certain drugs.³

Advancements in both the molecular and computational tools used to generate and analyse high-throughput RNA data has created a new and promising avenue for biomarker discovery. As RNA is one step closer to the downstream biology occurring at the protein-level, but can be measured by the same technologies developed for high-throughput DNA sequencing, it can provide a rich and dynamic view of a patient’s molecular profile. High-throughput RNA sequencing has been used to confirm the expression of a mutation or fusion transcript, which affords significant clinical value. However, these single-analyte biomarkers, such as gene fusions or mutations at the RNA level, in many ways limit analysis, similarly to DNA. They seek to predict patient response based on one facet of biology, which is highly oversimplified.

Multidimensional biomarkers

Taking a page out of the rational drug design book, the logical next step for rational biomarker design is to increase dimensionality. Just as bispecific and combination therapies have evolved to target multiple disease points, biomarkers should also seek to capture and utilise as much information about molecular profiles as possible. Early efforts to accomplish this in colorectal cancer resulted in a new system of molecular subtyping.⁴ Researchers also began to build “signatures” of RNA, which consisted of ranked gene-lists to better classify patients.⁵ Today, by leveraging machine-learning tools, researchers can filter out vast levels of noise and identify only the most useful data signals to build what are known as RNA models.

Using multidimensional RNA models built by machine-learning for predictive biomarkers is superior to single-analyte biomarkers. Beyond the molecular advantages described above, using machine-learning to build these models provides a rational, data-driven method and the output is the optimal combination of signals.

This approach requires researchers to put patient profiles at the centre, capturing a myriad of signals that represent disease, immune response, therapy response, etc. The resulting biomarkers have shown impressive improvements in predictive accuracy over single-analyte approaches in exploratory studies, even in the early stages of this new era of biomarkers built using predictive models.

What’s next?

In this narrative on modelling disease, one might assume we are talking about either the disease or the cancer cells themselves. However, it has become increasingly apparent that factors such as immune response to disease can be highly predictive of patient survival, response to traditional therapies and of course, response to some of the most advanced therapies currently available, such as immunotherapies. Multidimensional predictive biomarker models of the immune system are built using an approach that has been described as Predictive Immune Modelling. These models, like many, serve to capture biological complexity and use data to predict patient response. In the future, many comprehensive biomarkers will require curating highly standardised databases of multidimensional biomarkers complete with metadata, clinical data, outcomes data, etc. so that researchers may begin to draw conclusions from what they already know to be true.

How will today’s biomarkers, built to stratify one patient population, be able to inform another patient population with similar molecular profiles or select for a therapy with a similar mechanism of action? Without a doubt, to achieve the goal of precision medicine, we will need continued investment in rational biomarker design using the most informative molecular and computational tools available today, including RNA sequencing and machine-learning tools.

About the author

Dr Jarret Glasscock is a geneticist and computational biologist. He is the founder and CEO of Cofactor Genomics. Prior to founding the company, Jarret was faculty in the Department of Genetics at Washington University and part of The Genome Institute.

References

Sawyer J, Anderson B, Beight D, Campbell R, Jones M, Herron D et al. Synthesis and Activity of New Aryl- and Heteroaryl-Substituted Pyrazole Inhibitors of the Transforming Growth Factor-β Type I Receptor Kinase Domain. Journal of Medicinal Chemistry [Internet]. 2003 [cited 23 December 2019];46(19):3953-3956. Available from: https://pubs.acs.org/doi/10.1021/jm0205705
Singh J, Chuaqui C, Boriack-Sjodin P, Lee W, Pontz T, Corbley M et al. Successful shape-Based virtual screening: The discovery of a potent inhibitor of the type I TGFβ receptor kinase (TβRI). Bioorganic & Medicinal Chemistry Letters [Internet]. 2003 [cited 23 December 2019];13(24):4355-4359. Available from: https://www.sciencedirect.com/science/article/pii/S0960894X03009946?via%3Dihub
Bonanno L, Zulato E, Attili I, Pavan A, Del Bianco P, Nardo G et al. 1830OLiquid biopsy as tool to monitor and predict clinical benefit from chemotherapy (CT) and immunotherapy (IT) in advanced non-small cell lung cancer (aNSCLC): A prospective study. Annals of Oncology. 2018;29(suppl_8).
Menter D, Davis J, Broom B, Overman M, Morris J, Kopetz S. Back to the Colorectal Cancer Consensus Molecular Subtype Future. Current Gastroenterology Reports. 2019;21(2).
Alexander E, Kennedy G, Baloch Z, Cibas E, Chudova D, Diggans J et al. Preoperative Diagnosis of Benign Thyroid Nodules with Indeterminate Cytology. New England Journal of Medicine. 2012;367(8):705-715.

Related conditions
Cancer

Related organisations
Biogen-Idec, Eli Lilly

Related people
Juswinder Singh, Scott Sawyer

Cookie	Type	Duration	Description
cookielawinfo-checkbox-advertising-targeting	persistent	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	persistent	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	session	1 year	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	persistent	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	session	1 year	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Type	Duration	Description
advanced_ads_browser_width	persistent	1 month	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	persistent	2 years	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	persistent	1 month	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	persistent	1 year	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	persistent	2 years	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	persistent	2 years	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	persistent	3 months	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	persistent	1 month	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	persistent	5 months	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Type	Duration	Description
bcookie	persistent	2 years	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	persistent	30 minutes	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	session	1 year	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	persistent	1 day	This cookie is set by LinkedIn and used for routing.
lissc	persistent	11 months	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	persistent	2 years	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	persistent	2 years	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	persistent	20 minutes	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	persistent	20 minutes	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	persistent	20 minutes	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	persistent	2 years	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	persistent	1 minute	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	persistent	1 day	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Type	Duration	Description
cf_ob_info	persistent	1 minute	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	persistent	1 minute	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	session	1 year	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	persistent	1 month	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	persistent	Until cleared	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	session	1 year	This cookie is set by Youtube and is used to track the views of embedded videos.

Recommended

RNA and machine learning: rational design for multidimensional biomarkers

The case for predictive biomarkers

The era of genomic medicine

Multidimensional biomarkers

What’s next?

About the author

References

Leave a Reply Cancel reply