canSAR: a portal to Big Data for drug discovery

Share via

Posted: 9 February 2016 | Dr Bissan Al-Lazikani and Elizabeth Coker, The Institute of Cancer Research, London | 1 comment

Dr Bissan Al-Lazikani and Elizabeth Coker discuss canSAR: the first public domain example of Big Data for drug discovery…

The need for new cancer drug targets is increasing: in the field of cancer therapy there is increased evidence of tumour heterogeneity and acquired resistance to existing targeted therapies.

As genome sequencing becomes more routine, the potential for drugs tailored to specific patient subpopulations has also become increasingly within reach [1]. The urgent need for innovation in drug discovery, while keeping development risk under control, has been widely discussed [2]. Yet translating this into real, actionable targets and drugs can be easier said than done. A new update to canSAR [3], a freely available, public knowledgebase for drug discovery, aims to facilitate this work.

Big Data refers not only to the volume of data sets, but also their diversity. The challenge of integrating orthogonal large datasets from different disciplines and harnessing maximal value from them is great. canSAR [3] is a Cancer Research UK-funded resource developed by a team led by Dr Bissan Al-Lazikani’s at The Institute for Cancer Research in London and is the first public domain example of Big Data for drug discovery. canSAR aims to provide researchers with concise, connected information on cancer genes, patient tissue, drugs or disease details in a single location. canSAR integrates billions of quality controlled experimental datapoints from diversity if public resources as well as data unique to canSAR. Importantly, the data within canSAR are then analysed to produce novel predictions for drug discovery using especially developed datamining and artificial intelligence methodologies. These include predictions for target druggability as well as a variety of tools for interpreting large datasets. Although canSAR was primarily designed to meet the needs of cancer drug discovery projects, it contains data and predictions for the entire human proteome and, therefore, is of value to many other areas of human disease.

Key new features in 2016 [3] focus on analysing >2,100,000 cavities on the 3D structures of all proteins in the Protein Databank. Through this analysis, the team identified >94,000 ‘druggable’ cavities, potentially suitable for the development of small molecule drugs. Another new dataset in the new canSAR is a mapping of the interactome and network-based druggabilities of >13,000 human proteins. In another study, the canSAR team developed novel machine learning techniques to identify key druggable nodes in cellular networks using the ‘social interaction’ behaviour of proteins, the first such analysis in the public domain [4].

As well as >2,100,000 cavities from over 114,000 3D protein structures, the new canSAR contains more than 1.1 million experimentally validated bioactive, small molecule drugs and compounds, >10 million pharmacological activities, genetic data from >10,000 patient samples including 209 million+ gene expression data points; and summary data from nearly 200,000 cancer clinical trials.

The research team behind canSAR successfully applies this technology to the world-leading Cancer Research UK’s Cancer Therapeutics Unit’s own drug discovery portfolio. However, as no single group can battle cancer alone, the team make canSAR available to enable cancer translational research worldwide. canSAR currently has more than 150,000 unique users from 179 countries and is used by both academia and industry. Illustrations of how canSAR can uncover novel targets have been published by the team as applied to cancer genes from pan-cancer analysis [1, 5] and to cancer processes such as DNA damage repair [6]. The results of these analyses are all available through canSAR for the drug discovery community.

canSAR enables user to get a sense of the bigger picture around their target, whilst maintaining depth of information and traceability. canSAR provides a portal through which drug discovery researchers can access the wealth of public Big Data, and by doing so is enabling drug discovery throughout the world.

canSAR is available at cansar.icr.ac.uk

References

Workman P, Al-Lazikani B. Drugging cancer genomes. Nat Rev Drug Discov 2013;12:889-90.
Berggren R, Moller M, Moss R, Poda P, Smietana K. Outlook for the next 5 years in drug innovation. Nat Rev Drug Discov 2012;11:435-6.
Tym JE, Mitsopoulos C, Coker EA, Razaz P, Schierz AC, Antolin AA, et al. canSAR: an updated cancer research and drug discovery knowledgebase. Nucleic Acids Res 2016;44:D938-43.
Mitsopoulos C, Schierz AC, Workman P, Al-Lazikani B. Distinctive Behaviors of Druggable Proteins in Cellular Networks. PLoS Comput Biol 2015;11:e1004597.
Patel MN, Halling-Brown MD, Tym JE, Workman P, Al-Lazikani B. Objective assessment of cancer genes for drug discovery. Nat Rev Drug Discov 2013;12:35-50.
Pearl LH, Schierz AC, Ward SE, Al-Lazikani B, Pearl FM. Therapeutic opportunities within the DNA damage response. Nat Rev Cancer 2015;15:166-80.

Biographies

Dr Bissan Al-Lazikani, Team Leader, The Institute of Cancer Research, London

Bissan is a computational biologist and data scientist at The Institute of Cancer Research, London. She is formally trained in molecular biology and computer science and has both academic and industrial experience in integrative data analysis and machine learning for drug discovery and therapeutic application. Her research team at the ICR applies predictive technologies to select novel targets for cancer drug discovery, drug repurposing and predicting drug resistance and effective combinations. Her team developed canSAR.icr.ac.uk, the world’s largest public drug discovery knowledgebase. Her recent focus is on the application of artificial intelligence technologies towards individualised adaptive therapy.

Elizabeth Coker, PhD Student, The Institute of Cancer Research, London

Elizabeth is a final year PhD student jointly supervised by Dr Bissan Al-Lazikani and Professor Paul Workman at The Institute of Cancer Research, London. Her computational biology PhD focuses on modeling and predicting tumour behavior in response to targeted drugs and drug combinations. Prior to her PhD she studied genetics and systems biology. Elizabeth works with the rest of the canSAR team to design and develop new features for canSAR and regularly provides training in how to use the database.

Cookie	Type	Duration	Description
cookielawinfo-checkbox-advertising-targeting	persistent	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	persistent	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	session	1 year	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	persistent	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	session	1 year	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Type	Duration	Description
advanced_ads_browser_width	persistent	1 month	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	persistent	2 years	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	persistent	1 month	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	persistent	1 year	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	persistent	2 years	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	persistent	2 years	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	persistent	3 months	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	persistent	1 month	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	persistent	5 months	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Type	Duration	Description
bcookie	persistent	2 years	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	persistent	30 minutes	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	session	1 year	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	persistent	1 day	This cookie is set by LinkedIn and used for routing.
lissc	persistent	11 months	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	persistent	2 years	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	persistent	2 years	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	persistent	20 minutes	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	persistent	20 minutes	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	persistent	20 minutes	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	persistent	2 years	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	persistent	1 minute	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	persistent	1 day	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Type	Duration	Description
cf_ob_info	persistent	1 minute	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	persistent	1 minute	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	session	1 year	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	persistent	1 month	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	persistent	Until cleared	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	session	1 year	This cookie is set by Youtube and is used to track the views of embedded videos.

Recommended

canSAR: a portal to Big Data for drug discovery

References

One response to “canSAR: a portal to Big Data for drug discovery”

Leave a Reply Cancel reply