article

How next-generation sequencing came to be: a brief history

DNA sequencing technologies have come on leaps and bounds since the double-helical structure of DNA was first discovered in 1953 by genetics pioneers James Watson and Francis Crick. This discovery paved the way to the development, years later, of next generation sequencing (NGS), a high-throughput technology that enables scientists to produce huge quantities (typically millions or billions) of DNA sequence data more quickly and cheaply than ever before.

next generation sequencing DNA graphic

NGS has revolutionised the fields of genomics and molecular biology and continues to do so as incremental improvements over time make the process ever faster, less costly and more efficient. The article aims to give an introductory overview of NGS: the history leading up to its conception, its many applications and some of today’s technologies.

One of the first forms of nucleotide sequencing was with RNA. A Nobel Prize was awarded to biochemist Robert Holley after he developed, with colleagues, sequencing methods for transfer RNA (tRNA) in 1964. He went on to unravel of the genetic code of RNA, determining the complete sequence of the 77 ribonucleotides in alanine tRNA, the molecule that is responsible for incorporating alanine into proteins. His technique was to use two ribonucleases to split the tRNA into pieces, and then to piece together the ensuing ‘puzzle’. This was the first nucleotide sequence of a ribonucleic acid ever determined.

Just a few years later, in 1972, Paul Berg was credited for developing the first recombinant DNA molecule. He used a technology that enabled the isolation of DNA fragments so that individual genes could then be inserted into mammalian cells or into rapidly growing organisms such as bacteria. Scientists Frederick Sanger and Walter Gilbert then developed rapid sequencing methods for ‘long’ DNA in 1977, which for the first time made it possible to read the nucleotide sequence for entire genes (1,000 to 30,000 bases long). Like Watson and Crick, these renowned scientists went on to receive Novel Prizes for their efforts.  

The first entire DNA genome to be sequenced was that of bacteriophage ΦX 174. Frederick Sanger and his team sequenced this in 1977, and so the Sanger sequencing method was born. This paved the way to bigger genomics breakthroughs; less than 10 years after this, in 1984, scientists at the Medical Research Council deciphered the complete DNA sequence of the Epstein-Barr virus, which was found to be over 170 thousand base-pairs long. While progress was being made in DNA sequencing, a DNA amplification technique known as polymerase chain reaction (PCR) technology was developed in 1983, and this technology could also be applied to DNA sequencing technologies.

All of these incremental genetic achievements set the stage for what is arguably the most famous scientific discovery of all time: the deciphering of the human genome. The Human Genome Project was a worldwide scientific endeavour that enlisted several countries (the UK, US, France, Germany, Japan, China and India) in 1990. Ten years later, in 2000, the result the scientific community had been waiting for was declared: a ‘working draft’ of the human genome had been sequenced, comprising 85% of the genome. Cooperation between the participating countries and advancements in the field of sequence analysis and information technology then led, three years later, to the complete genome being announced. This US government-funded project – the world’s largest collaborative project of all time – revealed that we, as a species, are made up of 3.3 billion base pairs, around 23,000 genes. Computational representations of our DNA showed that we could literally be read like books, and now, with this wealth of information, we could start to discover the ways in which genes and families of genes function and occasionally malfunction.

A new wave of sequencing methods

As soon as the draft of the human genome was out (and even before this), companies had started in earnest to invent and bring to market more sophisticated sequencing technologies and the associated instruments.

However, although it was highly accurate, useful for many applications and credited for cracking the human genome, original Sanger sequencing, which employed the ‘chain-termination’ method, was a costly process and therefore deemed impractical for larger sequencing projects. The National Human Genome Institute in 2013 revealed that the Genome Sequencing Program had cost $100 million in 2001, but a decade later, NGS technologies reaching the market had brought this figure down to $10,000 in the year 2011. Sanger sequencing was the go-to sequencing method right up until the mid-2000s, but it had had its time and new methods that reduced cost were needed.

The ‘first generation’ automated Sanger method used to sequence the genome started to make way to newer next-generation methods. Driven by the demand for low-cost sequencing, these NGS technologies worked by parallelising the sequencing process, thus producing huge volumes of data (sequences) concurrently, and especially with the newer instruments on the market, they could sequence genomes quickly and accurately.

The first NGS technology to be commercialised was known as ‘sequencing by synthesis’ (SBS); this technique evolved from a method known as ‘massively parallel signature sequencing’ (MPSS) which Lynx Therapeutics first developed in the 1990s. MPSS was a bead-based method employing a complex approach of adapter ligation followed by adapter decoding, reading the sequence in increments of four nucleotides. In 2004, Lynx Therapeutics merged with Solexa (which was itself later acquired by Illumina), which led to the conception of the much more simple SBS method, though the basic principles of MPSS would remain important.

Marking the second of a new wave of sequencing technologies, in 2004, 454 Life Sciences (now owned by Swiss giant Roche) marketed its paralleled version of pyrosequencing, which reduced sequencing costs dramatically compared to automated Sanger sequencing. The benefit of pyrosequencing was that it provided immediate read lengths. Roche went on to improve its technology even more and subsequently, the 454 GS 20 Roche sequencing platform, introduced in 2005-2006, was able to produce 20 million bases (20 Mbp). The firm’s next model, in 2007 (GS FLX), could produce over 100 Mbp of sequence in four hours, and in 2008 it could provide 400 Mbp. The 454 GS-FLX+ Titanium sequencing platform now available, can produce over 600 Mbp of data in a single run with Sanger-like read lengths of up to 1,000 bp.

The Solexa system came next, and the company behind it, Solexa, released the Genome Analyzer in 2005, before the company was purchased by Illumina in 2007. That year, scientists at Solexa used SBS technology to sequence the complete genome of the same bacteriophage Sanger had first sequenced. However, this method, based on reversible dye-terminators technology and engineered polymerases, yielded significantly more sequence data than the Sanger method, with over 3 million bases produced from a single run. Like Roche, Illumina went on to develop its own series of instruments, with varying outputs, run times, paired end reads, maximum reads length and cluster generations.

Another technological NGS advancement was that of oligonucleotide ligation detection (‘Sequencing by Oligonucleotide Ligation and Detection’, or SOLiD). This technology has been available since 2006 and is capable of generating hundreds of millions to billions of small sequence reads at one time. The system involves labelling a pool of all possible oligonucleotides of a fixed length according to the sequenced position. The technology is said to be 99.94% accurate due to the two base encoding method and SOLiD has been applied to whole genome cluster analysis.

More recent NGS systems include Life Technologies’ Ion Torrent sequencer, based on the detection of hydrogen ions released during DNA polymerisation (as opposed to the optical methods employed in other systems); DNA nanoball sequencing, a whole-genome sequencing technique that uses ‘rolling circle replication’ methods to amplify small fragments of genomic DNA into DNA nanoballs; and heliscope sequencing, a method of single-molecule sequencing developed by Helicos Biosciences.

It appears there is still scope for improving DNA sequencing technology as well, with several methods currently in development to reflect this. Nanopore DNA sequencing is one such promising technology – this involves reading the sequence as a DNA strand travels through nanopores. Microscopy-based techniques that can identify the positions of individual nucleotides within long DNA fragments are also in development, while ‘third generation technologies’ promise increased throughput, decreased time to result and lowered cost by removing the need for excessive reagents and harnessing the processivity of DNA polymerase.

NGS: potential that is as vast as the data it provides

As can be expected from a technology that has caught the attention of so many scientists and research institutions worldwide, the potential for NGS in many disciplines of biology and medicine is huge. While it has shown particular promise in plant virology and in discovering some of the viruses that affect plant crops, its potential in human virology is also noteworthy. Monitoring population diversity in HIV was the first application of NGS, and it has also proved vital in controlling infection in countries which have a high prevalence of communicable diseases. For example, a novel arenavirus was discovered via NGS of infected blood serum in September 2008 during an outbreak of unexplained haemorrhagic fever in South Africa. NGS can enable pathogens to be discovered rapidly in order to monitor such outbreaks.

Within the field of medical virology, NGS also enables viral variability and evolution to be monitored, unknown viral pathogens to be discovered and even tumour viruses to be identified. Drug resistance profiles have also been analysed through NGS, and viral vaccines have been subject to NGS-based quality control.

Meanwhile, new insights into genome expression can be gleaned by transcriptomics studies for measurements of mRNA, which enable us to gain an understanding of how genomes can change in health and disease.

Future perspectives for Next-Generation Sequencing

So what can we expect in the future? The focus of companies specialising in NGS has been the invention of faster instruments, but now nanotechnology is also set to be the ‘next big thing’, with instruments likely to become smaller in size. NGS has become the premier tool for the geneticist and promises a greater understanding of the basis of disease such as genetic disorders and cancer, opening avenues for increased personalised therapies and screening as well. The question that arises is how do we deal with all the data resulting from NGS and ensure standardisation of NGS workflows? This will likely become a more pertinent issue over the coming years, as whole and partial-genome sequencing with such advanced technologies increase.