article

Petabytes of data – how informatics is transforming precision medicine

Advances in informatics have afforded researchers the ability to extrapolate petabytes of human genomics data and translate it into biologically relevant information. However, further translating this information into knowledge can prove challenging. Slavé Petrovski, Vice President and Head of Genome Analytics and Bioinformatics for AstraZeneca’s Centre for Genomics Research, spoke to Nikki Withers about how informatics has positively impacted precision medicine and genomics research.

Genomics data

Why is precision medicine so important for drug discovery?

The overarching goal of precision medicine is to transform patients’ lives by personalising their treatment. This can be achieved by identifying the underlying molecular cause or biomarkers of disease in individual patients. By knowing this, we aim to match medicines to those patients who are most likely to benefit from that specific treatment.

If you look at our research pipeline, approximately 90 percent follows a precision medicine approach compared to about 10 percent back in 2009. This includes a broad range of cutting-edge technologies for both wet lab and informatics, tumour tissue diagnostics, molecular tests and point-of-care diagnostics, which are allowing information to be available to the physician at the point of interaction with the patient.

How has informatics transformed genomics research?

DNA and informaticsIn terms of sequencing technologies, informatics has improved our ability to generate high-quality data from raw samples. Having sophisticated algorithms allows us to turn this raw data into useful information. For example, aligning raw genomics data onto a reference genome allows us to identify which parts of the genome in an individual deviate from the rest of the population. Informatics has also allowed us to perform more sophisticated downstream analyses, such as adopting machine learning and artificial intelligence (AI) to mine these genetic variations in order to gain further biological insight.

Something we have been looking at recently is the use of sophisticated analytical frameworks on top of these data to ask further questions and tease out answers. For example: why does it matter that genetic variation is present in that individual? Does it cause disease? Does it change the way they respond to treatment? At AstraZeneca, we have a cloud-based informatics pipeline workflow, which processes all the genomes from our genomics initiative – it is our ambition to analyse up to two million genomes by 2026. This is optimised to the point where we can now complete the end-to-end analysis of approximately 1,300 sequences in an hour. To put that into context, that is a 10-fold increase in efficiency from 2017 and this is driven by the optimisation of our informatics pipeline in the cloud.

How is informatics aiding advances in precision medicine?

Every one of us has approximately three billion bases in our genome”

Every one of us has approximately three billion bases in our genome; that is three billion data points to study. When you span that across two million individuals, you can appreciate how much data that is, and the reason informatics has become increasingly important. For example, patients in selected clinical trials who have consented to genetic analysis may have their data linked to their clinical outcomes. This allows us to study how variations in their three billion bases correlates with how they respond to or tolerate a treatment, and whether they were the right patient population for that medicine given the underlying cause of disease. By integrating these anonymised genomic and clinical data from the hundreds or thousands of participants in our clinical trial programmes, we are aiming to identify the actual genetic profiles that can predict disease progression and response to treatment.

What challenges does informatics present to researchers?

The main challenge is extrapolating the maximum amount of biological insight from the vast amount of data we are generating; we must address how we can translate petabytes of genomics data into biologically relevant information. Translating that information into knowledge is the next step in the process and is an area we are on the journey of, using AI and machine learning.

Informatics and the genomeAnother challenge relates to how we incorporate other information, such as transcriptomics or metabolomics data, into the process. At AstraZeneca, we have seen value in investing into a multi-omics strategy, where we add additional layers of data types to gain improved insight, at the protein level, into what the outcome of a genetic mutation might be. This is a huge challenge – and opportunity, and one we’re actively pursuing.

What are your thoughts on collaboration in this area of research?

Informatics redWe know the best science doesn’t happen in isolation, which is why we collaborate with world-leading institutions, companies and individuals who share our passion for redefining medical science. For example, we are currently mining the exome sequence data from 300,000 individuals from a large UK Biobank project that AstraZeneca is part of. In collaboration with other pharma partners, we hope to generate the exome and also the whole genome sequence data for half a million participants, which will be a remarkable medical research resource. Through this genetic research we hope to not only identify new drug targets, particularly in diseases that to date have unmet clinical needs, but also support precision medicine programmes. Having access to such a large sequence population, which could provide information on why some patients respond to treatments while others do not, helps in the design of new trials and in identifying new drug targets.

For this collaboration, it was clear to all the individual partners in this pre-competitive consortium that the costs prohibited us from doing it alone. The obvious conclusion was to work together to generate these data. This paradigm shift from, “This is my silo of data,” to, “Let us build an immense medical research resource that we could all – industry and academia – benefit from,” had to happen, otherwise we would be limited in terms of progress.

What developments do you expect to see in the next five years?

It is very exciting to see how recent progress in informatics and technology enables large genomic studies to be conducted at scale. I could not have imagined analysing the exomes of 300,000 individuals in any bioinformatics environment five years ago. Those capabilities did not exist; partly because we did not have that scale of genomics data so there was no need to push the boundaries of technology. Like we have seen in other fields, often it is the data that instigates the need to build up innovative IT architecture.

Purple DNAI am excited to see what happens with quantum computing. I think this is an area on which to keep a finger on the pulse, but that will probably be a few years away from maturity.

Moving to analytics, we now have access to hundreds, thousands and, before we know it, millions of genomes. I am excited to see what we can extract from these; from studying individual variants with large effects on clinical outcomes to looking at combinations of variants to polygenic risk scores – all with the aim of getting the right treatment to the right patient at the right time.

Slave PetrovskiSlavé Petrovski is the Vice President and Head of Genome Analytics and Bioinformatics for AstraZeneca’s Centre for Genomics Research (CGR). This involves designing and co-ordinating the human genomic studies of the CGR and the company’s broader Genomics Initiative. Slavé has an extensive academic background in human genomics, population genetics, precision medicine and leading large-scale human genomics studies.