How next-generation sequencing came to be: a brief history

Share via

Posted: 31 January 2015 | Caroline Richards (Drug Target Review) | No comments yet

DNA sequencing technologies have come on leaps and bounds since the double-helical structure of DNA was first discovered in 1953 by genetics pioneers James Watson and Francis Crick. This discovery paved the way to the development, years later, of next generation sequencing (NGS), a high-throughput technology that enables scientists to produce huge quantities (typically millions or billions) of DNA sequence data more quickly and cheaply than ever before.

NGS has revolutionised the fields of genomics and molecular biology and continues to do so as incremental improvements over time make the process ever faster, less costly and more efficient. The article aims to give an introductory overview of NGS: the history leading up to its conception, its many applications and some of today’s technologies.

One of the first forms of nucleotide sequencing was with RNA. A Nobel Prize was awarded to biochemist Robert Holley after he developed, with colleagues, sequencing methods for transfer RNA (tRNA) in 1964. He went on to unravel of the genetic code of RNA, determining the complete sequence of the 77 ribonucleotides in alanine tRNA, the molecule that is responsible for incorporating alanine into proteins. His technique was to use two ribonucleases to split the tRNA into pieces, and then to piece together the ensuing ‘puzzle’. This was the first nucleotide sequence of a ribonucleic acid ever determined.

Just a few years later, in 1972, Paul Berg was credited for developing the first recombinant DNA molecule. He used a technology that enabled the isolation of DNA fragments so that individual genes could then be inserted into mammalian cells or into rapidly growing organisms such as bacteria. Scientists Frederick Sanger and Walter Gilbert then developed rapid sequencing methods for ‘long’ DNA in 1977, which for the first time made it possible to read the nucleotide sequence for entire genes (1,000 to 30,000 bases long). Like Watson and Crick, these renowned scientists went on to receive Novel Prizes for their efforts.

The first entire DNA genome to be sequenced was that of bacteriophage ΦX 174. Frederick Sanger and his team sequenced this in 1977, and so the Sanger sequencing method was born. This paved the way to bigger genomics breakthroughs; less than 10 years after this, in 1984, scientists at the Medical Research Council deciphered the complete DNA sequence of the Epstein-Barr virus, which was found to be over 170 thousand base-pairs long. While progress was being made in DNA sequencing, a DNA amplification technique known as polymerase chain reaction (PCR) technology was developed in 1983, and this technology could also be applied to DNA sequencing technologies.

All of these incremental genetic achievements set the stage for what is arguably the most famous scientific discovery of all time: the deciphering of the human genome. The Human Genome Project was a worldwide scientific endeavour that enlisted several countries (the UK, US, France, Germany, Japan, China and India) in 1990. Ten years later, in 2000, the result the scientific community had been waiting for was declared: a ‘working draft’ of the human genome had been sequenced, comprising 85% of the genome. Cooperation between the participating countries and advancements in the field of sequence analysis and information technology then led, three years later, to the complete genome being announced. This US government-funded project – the world’s largest collaborative project of all time – revealed that we, as a species, are made up of 3.3 billion base pairs, around 23,000 genes. Computational representations of our DNA showed that we could literally be read like books, and now, with this wealth of information, we could start to discover the ways in which genes and families of genes function and occasionally malfunction.

A new wave of sequencing methods

As soon as the draft of the human genome was out (and even before this), companies had started in earnest to invent and bring to market more sophisticated sequencing technologies and the associated instruments.

However, although it was highly accurate, useful for many applications and credited for cracking the human genome, original Sanger sequencing, which employed the ‘chain-termination’ method, was a costly process and therefore deemed impractical for larger sequencing projects. The National Human Genome Institute in 2013 revealed that the Genome Sequencing Program had cost $100 million in 2001, but a decade later, NGS technologies reaching the market had brought this figure down to $10,000 in the year 2011. Sanger sequencing was the go-to sequencing method right up until the mid-2000s, but it had had its time and new methods that reduced cost were needed.

The ‘first generation’ automated Sanger method used to sequence the genome started to make way to newer next-generation methods. Driven by the demand for low-cost sequencing, these NGS technologies worked by parallelising the sequencing process, thus producing huge volumes of data (sequences) concurrently, and especially with the newer instruments on the market, they could sequence genomes quickly and accurately.

The first NGS technology to be commercialised was known as ‘sequencing by synthesis’ (SBS); this technique evolved from a method known as ‘massively parallel signature sequencing’ (MPSS) which Lynx Therapeutics first developed in the 1990s. MPSS was a bead-based method employing a complex approach of adapter ligation followed by adapter decoding, reading the sequence in increments of four nucleotides. In 2004, Lynx Therapeutics merged with Solexa (which was itself later acquired by Illumina), which led to the conception of the much more simple SBS method, though the basic principles of MPSS would remain important.

Marking the second of a new wave of sequencing technologies, in 2004, 454 Life Sciences (now owned by Swiss giant Roche) marketed its paralleled version of pyrosequencing, which reduced sequencing costs dramatically compared to automated Sanger sequencing. The benefit of pyrosequencing was that it provided immediate read lengths. Roche went on to improve its technology even more and subsequently, the 454 GS 20 Roche sequencing platform, introduced in 2005-2006, was able to produce 20 million bases (20 Mbp). The firm’s next model, in 2007 (GS FLX), could produce over 100 Mbp of sequence in four hours, and in 2008 it could provide 400 Mbp. The 454 GS-FLX+ Titanium sequencing platform now available, can produce over 600 Mbp of data in a single run with Sanger-like read lengths of up to 1,000 bp.

The Solexa system came next, and the company behind it, Solexa, released the Genome Analyzer in 2005, before the company was purchased by Illumina in 2007. That year, scientists at Solexa used SBS technology to sequence the complete genome of the same bacteriophage Sanger had first sequenced. However, this method, based on reversible dye-terminators technology and engineered polymerases, yielded significantly more sequence data than the Sanger method, with over 3 million bases produced from a single run. Like Roche, Illumina went on to develop its own series of instruments, with varying outputs, run times, paired end reads, maximum reads length and cluster generations.

Another technological NGS advancement was that of oligonucleotide ligation detection (‘Sequencing by Oligonucleotide Ligation and Detection’, or SOLiD). This technology has been available since 2006 and is capable of generating hundreds of millions to billions of small sequence reads at one time. The system involves labelling a pool of all possible oligonucleotides of a fixed length according to the sequenced position. The technology is said to be 99.94% accurate due to the two base encoding method and SOLiD has been applied to whole genome cluster analysis.

More recent NGS systems include Life Technologies’ Ion Torrent sequencer, based on the detection of hydrogen ions released during DNA polymerisation (as opposed to the optical methods employed in other systems); DNA nanoball sequencing, a whole-genome sequencing technique that uses ‘rolling circle replication’ methods to amplify small fragments of genomic DNA into DNA nanoballs; and heliscope sequencing, a method of single-molecule sequencing developed by Helicos Biosciences.

It appears there is still scope for improving DNA sequencing technology as well, with several methods currently in development to reflect this. Nanopore DNA sequencing is one such promising technology – this involves reading the sequence as a DNA strand travels through nanopores. Microscopy-based techniques that can identify the positions of individual nucleotides within long DNA fragments are also in development, while ‘third generation technologies’ promise increased throughput, decreased time to result and lowered cost by removing the need for excessive reagents and harnessing the processivity of DNA polymerase.

NGS: potential that is as vast as the data it provides

As can be expected from a technology that has caught the attention of so many scientists and research institutions worldwide, the potential for NGS in many disciplines of biology and medicine is huge. While it has shown particular promise in plant virology and in discovering some of the viruses that affect plant crops, its potential in human virology is also noteworthy. Monitoring population diversity in HIV was the first application of NGS, and it has also proved vital in controlling infection in countries which have a high prevalence of communicable diseases. For example, a novel arenavirus was discovered via NGS of infected blood serum in September 2008 during an outbreak of unexplained haemorrhagic fever in South Africa. NGS can enable pathogens to be discovered rapidly in order to monitor such outbreaks.

Within the field of medical virology, NGS also enables viral variability and evolution to be monitored, unknown viral pathogens to be discovered and even tumour viruses to be identified. Drug resistance profiles have also been analysed through NGS, and viral vaccines have been subject to NGS-based quality control.

Meanwhile, new insights into genome expression can be gleaned by transcriptomics studies for measurements of mRNA, which enable us to gain an understanding of how genomes can change in health and disease.

Future perspectives for Next-Generation Sequencing

So what can we expect in the future? The focus of companies specialising in NGS has been the invention of faster instruments, but now nanotechnology is also set to be the ‘next big thing’, with instruments likely to become smaller in size. NGS has become the premier tool for the geneticist and promises a greater understanding of the basis of disease such as genetic disorders and cancer, opening avenues for increased personalised therapies and screening as well. The question that arises is how do we deal with all the data resulting from NGS and ensure standardisation of NGS workflows? This will likely become a more pertinent issue over the coming years, as whole and partial-genome sequencing with such advanced technologies increase.

Related topics
Next Generation Sequencing (NGS)

Cookie	Type	Duration	Description
cookielawinfo-checkbox-advertising-targeting	persistent	1 year	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Advertising & Targeting".
cookielawinfo-checkbox-analytics	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Analytics".
cookielawinfo-checkbox-necessary	persistent	1 year	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	persistent	1 year	This cookie is set by GDPR Cookie Consent WordPress Plugin. The cookie is used to remember the user consent for the cookies under the category "Performance".
PHPSESSID	session	1 year	This cookie is native to PHP applications. The cookie is used to store and identify a users' unique session ID for the purpose of managing user session on the website. The cookie is a session cookies and is deleted when all the browser windows are closed.
viewed_cookie_policy	persistent	1 year	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.
zmember_logged	session	1 year	This session cookie is served by our membership/subscription system and controls whether you are able to see content which is only available to logged in users.

Cookie	Type	Duration	Description
advanced_ads_browser_width	persistent	1 month	This cookie is set by Advanced Ads and measures the browser width.
advanced_ads_page_impressions	persistent	2 years	This cookie is set by Advanced Ads and measures the number of previous page impressions.
advanced_ads_pro_server_info	persistent	1 month	This cookie is set by Advanced Ads and sets geo-location, user role and user capabilities. It is used by cache busting in Advanced Ads Pro when the appropriate visitor conditions are used.
advanced_ads_pro_visitor_referrer	persistent	1 year	This cookie is set by Advanced Ads and sets the referrer URL.
bscookie	persistent	2 years	This cookie is a browser ID cookie set by LinkedIn share Buttons and ad tags.
IDE	persistent	2 years	This cookie is set by Google DoubleClick and stores information about how the user uses the website and any other advertisement before visiting the website. This is used to present users with ads that are relevant to them according to the user profile.
li_sugr	persistent	3 months	This cookie is set by LinkedIn and is used for tracking.
UserMatchHistory	persistent	1 month	This cookie is set by Linkedin and is used to track visitors on multiple websites, in order to present relevant advertisement based on the visitor's preferences.
VISITOR_INFO1_LIVE	persistent	5 months	This cookie is set by YouTube. Used to track the information of the embedded YouTube videos on a website.

Cookie	Type	Duration	Description
bcookie	persistent	2 years	This cookie is set by LinkedIn. The purpose of the cookie is to enable LinkedIn functionalities on the page.
GPS	persistent	30 minutes	This cookie is set by YouTube and registers a unique ID for tracking users based on their geographical location
lang	session	1 year	This cookie is set by LinkedIn and is used to store the language preferences of a user to serve up content in that stored language the next time user visit the website.
lidc	persistent	1 day	This cookie is set by LinkedIn and used for routing.
lissc	persistent	11 months	This cookie is set by LinkedIn share Buttons and ad tags.
vuid	persistent	2 years	We embed videos from our official Vimeo channel. When you press play, Vimeo will drop third party cookies to enable the video to play and to see how long a viewer has watched the video. This cookie does not track individuals.
wow.anonymousId	persistent	2 years	This cookie is set by Spotler and tracks an anonymous visitor ID.
wow.schedule	persistent	20 minutes	This cookie is set by Spotler and enables it to track the Load Balance Session Queue.
wow.session	persistent	20 minutes	This cookie is set by Spotler to track the Internet Information Services (IIS) session state.
wow.utmvalues	persistent	20 minutes	This cookie is set by Spotler and stores the UTM values for the session. UTM values are specific text strings that are appended to URLs that allow Communigator to track the URLs and the UTM values when they get clicked on.
_ga	persistent	2 years	This cookie is set by Google Analytics and is used to calculate visitor, session, campaign data and keep track of site usage for the site's analytics report. It stores information anonymously and assign a randomly generated number to identify unique visitors.
_gat	persistent	1 minute	This cookies is set by Google Universal Analytics to throttle the request rate to limit the collection of data on high traffic sites.
_gid	persistent	1 day	This cookie is set by Google Analytics and is used to store information of how visitors use a website and helps in creating an analytics report of how the website is doing. The data collected including the number visitors, the source where they have come from, and the pages visited in an anonymous form.

Cookie	Type	Duration	Description
cf_ob_info	persistent	1 minute	This cookie is set by Cloudflare content delivery network and, in conjunction with the cookie 'cf_use_ob', is used to determine whether it should continue serving “Always Online” until the cookie expires.
cf_use_ob	persistent	1 minute	This cookie is set by Cloudflare content delivery network and is used to determine whether it should continue serving “Always Online” until the cookie expires.
free_subscription_only	session	1 year	This session cookie is served by our membership/subscription system and controls which types of content you are able to access.
ls_smartpush	persistent	1 month	This cookie is set by Litespeed Server and allows the server to store settings to help improve performance of the site.
one_signal_sdk_db	persistent	Until cleared	This cookie is set by OneSignal push notifications and is used for storing user preferences in connection with their notification permission status.
YSC	session	1 year	This cookie is set by Youtube and is used to track the views of embedded videos.

Recommended

How next-generation sequencing came to be: a brief history

A new wave of sequencing methods

NGS: potential that is as vast as the data it provides

Future perspectives for Next-Generation Sequencing

Leave a Reply Cancel reply