canSAR: a portal to Big Data for drug discovery

Posted: 9 February 2016 | Dr Bissan Al-Lazikani and Elizabeth Coker, The Institute of Cancer Research, London | 1 comment

Dr Bissan Al-Lazikani and Elizabeth Coker discuss canSAR: the first public domain example of Big Data for drug discovery…

The need for new cancer drug targets is increasing: in the field of cancer therapy there is increased evidence of tumour heterogeneity and acquired resistance to existing targeted therapies.

As genome sequencing becomes more routine, the potential for drugs tailored to specific patient subpopulations has also become increasingly within reach [1]. The urgent need for innovation in drug discovery, while keeping development risk under control, has been widely discussed [2]. Yet translating this into real, actionable targets and drugs can be easier said than done. A new update to canSAR [3], a freely available, public knowledgebase for drug discovery, aims to facilitate this work.

Big Data refers not only to the volume of data sets, but also their diversity. The challenge of integrating orthogonal large datasets from different disciplines and harnessing maximal value from them is great. canSAR [3] is a Cancer Research UK-funded resource developed by a team led by Dr Bissan Al-Lazikani’s at The Institute for Cancer Research in London and is the first public domain example of Big Data for drug discovery. canSAR aims to provide researchers with concise, connected information on cancer genes, patient tissue, drugs or disease details in a single location. canSAR integrates billions of quality controlled experimental datapoints from diversity if public resources as well as data unique to canSAR. Importantly, the data within canSAR are then analysed to produce novel predictions for drug discovery using especially developed datamining and artificial intelligence methodologies. These include predictions for target druggability as well as a variety of tools for interpreting large datasets. Although canSAR was primarily designed to meet the needs of cancer drug discovery projects, it contains data and predictions for the entire human proteome and, therefore, is of value to many other areas of human disease.

Key new features in 2016 [3] focus on analysing >2,100,000 cavities on the 3D structures of all proteins in the Protein Databank. Through this analysis, the team identified >94,000 ‘druggable’ cavities, potentially suitable for the development of small molecule drugs. Another new dataset in the new canSAR is a mapping of the interactome and network-based druggabilities of >13,000 human proteins. In another study, the canSAR team developed novel machine learning techniques to identify key druggable nodes in cellular networks using the ‘social interaction’ behaviour of proteins, the first such analysis in the public domain [4].

As well as >2,100,000 cavities from over 114,000 3D protein structures, the new canSAR contains more than 1.1 million experimentally validated bioactive, small molecule drugs and compounds, >10 million pharmacological activities, genetic data from >10,000 patient samples including 209 million+ gene expression data points; and summary data from nearly 200,000 cancer clinical trials.

The research team behind canSAR successfully applies this technology to the world-leading Cancer Research UK’s Cancer Therapeutics Unit’s own drug discovery portfolio. However, as no single group can battle cancer alone, the team make canSAR available to enable cancer translational research worldwide. canSAR currently has more than 150,000 unique users from 179 countries and is used by both academia and industry. Illustrations of how canSAR can uncover novel targets have been published by the team as applied to cancer genes from pan-cancer analysis [1, 5] and to cancer processes such as DNA damage repair [6]. The results of these analyses are all available through canSAR for the drug discovery community.    

canSAR enables user to get a sense of the bigger picture around their target, whilst maintaining depth of information and traceability. canSAR provides a portal through which drug discovery researchers can access the wealth of public Big Data, and by doing so is enabling drug discovery throughout the world.

canSAR is available at


  1. Workman P, Al-Lazikani B. Drugging cancer genomes. Nat Rev Drug Discov 2013;12:889-90.
  2. Berggren R, Moller M, Moss R, Poda P, Smietana K. Outlook for the next 5 years in drug innovation. Nat Rev Drug Discov 2012;11:435-6.
  3. Tym JE, Mitsopoulos C, Coker EA, Razaz P, Schierz AC, Antolin AA, et al. canSAR: an updated cancer research and drug discovery knowledgebase. Nucleic Acids Res 2016;44:D938-43.
  4. Mitsopoulos C, Schierz AC, Workman P, Al-Lazikani B. Distinctive Behaviors of Druggable Proteins in Cellular Networks. PLoS Comput Biol 2015;11:e1004597.
  5. Patel MN, Halling-Brown MD, Tym JE, Workman P, Al-Lazikani B. Objective assessment of cancer genes for drug discovery. Nat Rev Drug Discov 2013;12:35-50.
  6. Pearl LH, Schierz AC, Ward SE, Al-Lazikani B, Pearl FM. Therapeutic opportunities within the DNA damage response. Nat Rev Cancer 2015;15:166-80.


Dr Bissan Al-Lazikani, Team Leader, The Institute of Cancer Research, London

Bissan is a computational biologist and data scientist at The Institute of Cancer Research, London. She is formally trained in molecular biology and computer science and has both academic and industrial experience in integrative data analysis and machine learning for drug discovery and therapeutic application. Her research team at the ICR applies predictive technologies to select novel targets for cancer drug discovery, drug repurposing and predicting drug resistance and effective combinations. Her team developed, the world’s largest public drug discovery knowledgebase. Her recent focus is on the application of artificial intelligence technologies towards individualised adaptive therapy.

Elizabeth Coker, PhD Student, The Institute of Cancer Research, London

Elizabeth is a final year PhD student jointly supervised by Dr Bissan Al-Lazikani and Professor Paul Workman at The Institute of Cancer Research, London. Her computational biology PhD focuses on modeling and predicting tumour behavior in response to targeted drugs and drug combinations. Prior to her PhD she studied genetics and systems biology. Elizabeth works with the rest of the canSAR team to design and develop new features for canSAR and regularly provides training in how to use the database.