NGS: hunting mysterious ‘Dark Matter Genome’ towards rewriting the rules of human genetic diseases

Next generation sequencing (NGS) has revolutionised genomics research providing a wealth of genetic information of immense value to researchers. NGS technologies have been evolving over the last decade, leading to substantial improvement in understanding different biological systems from broader and deeper perspectives.1

genetic imaging

Contemporary advances in high throughput DNA sequencing technologies have ultimately enabled investigations into the entire genome and specific regions of interest, thus providing accurate in-depth genomic information and biological insight into unexpected DNA changes.2 In recent years, several NGS platforms with different sequencing chemistries have been developed that can perform sequencing of millions of smaller DNA fragments in parallel. With the advent of NGS technology, human genomes can now be systematically studied in their entirety at a much faster pace and cheaper cost. Sequencing of the human genome provides many benefits including more accurate diagnosis, prognosis and classification of diseases. It also offers attractive ways to identify ‘druggable’ casual mutations and other genetic variations, such as substitutions, insertions, deletions, inversions and translocation that could serve as an underlying cause for many human genetic diseases.3

Deciphering the genetic information encoded by the human genome is paramount in biomedical research. Extensive investigation into genetic mutations that cause human diseases has uncovered that the disease-causing mutations occur within 1-2% of protein-coding DNA called ‘exons’. The remaining 98% of the genome constitutes ‘dark matter’ whose function remains unclear. Interpreting the effect of DNA changes within genomic dark matter is very difficult.4

Recent advances in NGS technology have shed light on annotating genomic dark matter and assigning biological functions to many different regions of the human genome, such as promoters, enhancers, repressors, transcription factor binding sites and other regulatory elements. NGS technology has also accelerated the process of identifying disease-causing DNA abnormalities through genome-wide comprehensive scanning of non-coding mutations with regulatory benefits in the human genome. Detailed investigation of the types and genetic alterations that are related to human disease may lend insight into the interpretation of similar DNA abnormalities in other genetic disorders.5 Thus, uncovering the non-protein coding DNA sequences with regulatory potential using NGS-based approaches could help us move forward in targeting the entire genome for clinical purposes. This article outlines the unmatched opportunities presented by NGS technologies to study genomic dark matter for better understanding of health and disease, opening doors for the development of novel treatments for human genetic disorders.   

Whole Genome Sequencing (WGS)

WGS represents a high-throughput sequencing approach that allows sequencing of the human genome in its totality. The sequencing of whole genomes enables us to take a comprehensive look at the genomic landscapes of pathogenic mutations and determine genetic predisposition to certain diseases. WGS is increasingly being promoted as a widely used platform for studying genetic variations associated with common and complex diseases,6 and has enabled researchers to identify larger deletions, insertions, copy number variations and other DNA changes within the genome. WGS has the potential to capture complete information on the non-coding regions such as promoters, enhancers, introns, untranslated regions (UTRs) and other regulatory elements in the genome. Notably, this method has been used in Genome-Wide Association Studies (GWAS), which identified the number of genetic variants associated with complex diseases.7 In recent years, WGS has expanded its diagnostic utility and improved detection of rare genetic disorders in humans that can be prevented or treated by early diagnosis and intervention.8,9 Thus, WGS serves as a powerful tool to detect genetic abnormalities associated with diseases onset in humans, paving the way for the discovery of novel methods to monitor and treat rare and genetic diseases.

DNase-sequencing (DNase-seq)

DNase-seq is a high-throughput sequencing method used for mapping active gene regulatory elements across the genome of mammalian cells. DNase-seq serves as an ideal tool for identifying all types of gene regulatory elements in a single assay, and it can also be performed on any cell type from any species with a sequenced genome.10 DNase-seq utilises an endonuclease called ‘DNase I’ that selectively digests nucleosome-depleted DNA in the genome. Furthermore, the DNase I-digested fragments were captured and sequenced by high-throughput NGS. Mapping DNase I hypersensitive (HS) sites has proven valuable in exploring different types of regulatory elements in the genome, including promoters, silencers, enhancers, insulators and locus control regions.11 However, this method does not directly reveal the biological function of the regulatory elements, which requires further follow up studies such as ChIP-seq and other functional assays to define the precise biological functions and activities associated with regulatory elements in the human genome.

Chromatin Immunoprecipitation Sequencing (ChIP-Seq)

ChIP-Seq is an ultra-high-throughput DNA sequencing method that allows mapping of in vivo protein-DNA interactions comprehensively across entire genomes. In ChIP assays, an antibody specific to a DNA binding factor is used to enrich target DNA sequences that physically bind with a factor in the living cell. The immune-precipitated target DNA regions were then subjected to sequencing by NGS and the sequencing data generated can be used to discover novel binding-site motifs by computational methods.12 ChIP-Seq has greater potential to identify mutations in binding-site sequences and thus provides direct evidence for any observed changes in protein binding and gene regulation. The innovation of the genome-wide ChIP approach has led to the increased practicality of studying cis-regulation and transcription factor function. In recent years, ChIP-seq has explored an array of genome-wide transcription factor binding sites, active promoters and enhancers in non-protein coding regions of the genome that mediates gene regulation. Genome-wide ChIP studies have also linked methylation and acetylation histone marks in the genome to active transcription and repression.13 Furthermore, the transcription factor requirement for specific gene regulation could be easily mapped by integrating ChIP-seq with RNA-seq.14 More importantly, ChIP-seq can be used to identify single nucleotide polymorphism (SNPs) in promoters, enhancers and transcription factor binding sites that could potentially affect the normal transcriptional regulation of the associated gene. Thus, ChIP-seq represents an important approach for characterising putative functions of genomic dark matters that regulate gene expressions in humans.

Assay for transposase-accessible chromatin sequencing (ATAC-seq) 

ATAC-seq is a rapid and sensitive method of sequencing that can be used to assess genome-wide chromatin accessibility. This method builds on a unique process known as ‘tagmentation’ that allows simultaneous fragmentation of chromatin accessible regions genome-wide, using mutant hyperactive Tn5 transposase, followed by tagging of fragmented DNA with sequencing DNA adaptors. These sequencing adaptors would then facilitate the sequencing of a fragmented DNA library using NGS. ATAC-seq is mainly designed for NGS-based platforms and it has been widely adapted as an effective approach to explore open chromatin genome-wide. This method has been emerging as an information-rich genome-wide analysis approach to understand the epigenetic structure of genomes, such as transcription factor binding sites, accurate position of canonical nucleosomes and provides access to chromatin accessible regulatory elements such as promoters, enhancers, and insulators.15 ATAC-seq can be performed on a wide range of cell types and is also highly compatible with other biological methods such as cell sorting and preparation of single-cell suspensions from tissues. Thus, ATAC-seq serves as the most valuable approach for swiftly screening biologically active regions in the human genome and revealing the activities of disease-associated DNA elements in distinct human structures.

RNA Capture Long Sequencing

Long non-coding RNA (lncRNAs) are poorly understood areas of genomic dark matter, which are among the largest group of RNA, and have been potentially linked to a variety of human genetic disorders. In recent years, researchers have developed a new method named ‘RNA Capture Long Seq (CLS)’ that focuses specifically on the non-coding regions of the human genome. This method has allowed researchers to identify, map and characterise ~3,500 lncRNAs in humans that would help researchers fully understand the biological functions and roles of lncRNAs in diseases.16 This advanced sequencing approach has also improved the most important genomic database, GENCODE, which serves as a worldwide reference for all genes encoded in the human and mouse genomes. Thus, the CLS approach represents an important step in characterising the genomic features of lncRNAs and thereby helping to improve the lncRNA catalogue that will benefit researchers in understanding the role of poorly characterised lncRNAs in health and disease.


To conclude, NGS is the most powerful approach that allows better understanding of the genetic architecture of human diseases. With ultra-high-throughput scalability and power, NGS enables researchers to unravel the secret behind genomic dark matter, which provides clinically valuable insights into the common genetic variants and regulatory sequences vital for life. Most human diseases have been found to be associated with regulatory sequences, thus understanding the complex relationship between the regulatory sequences in dark matter and associated gene expression using NGS-based approaches would illuminate the biological significance of genomic dark matter. The method could also reveal new therapeutic targets to rewrite the rule of complex genetic diseases.  As we look to the future, human genomic research is where NGS is likely to have the most impact, diagnosing and treating incurable human genetic disorders.


PUSHPANATHAN MUTHUIRULAN is currently a Research Associate at Harvard University studying the developmental and genetic basis of human height variations using functional genomics approaches. Previously, he worked as a Postdoctoral Researcher at the National Institutes of Health, where his research focused on developing state-of-the-art technologies using CRISPR-Cas9 and super‑resolution microscopy to map neural circuits that involves visual motion information processing in Drosophila. His expertise lies in omics technologies, drug discovery, neuroscience and developmental and evolutionary genetics.


  1. Muthuirulan P, Sharma P. NGS: empowering infectious disease research beyond reality. (2017). Drug Target Review4(3), 45-50.
  2. Behjati S, Tarpey PS. What is next generation sequencing? (2013). Archives of Disease in Childhood-Education and Practice98(6), 236-238.
  3. Venter JC, Adams MD, Myers EW, Li PW, Mural RJ, Sutton GG, et al. The sequence of the human genome. (2001). Science291(5507), 1304-1351.
  4. Leucci E. Cancer development and therapy resistance: spotlights on the dark side of the genome. (2018). Pharmacology & therapeutics.
  5. Kiser DP, Rivero O, Lesch KP. Annual Research Review: The (epi) genetics of neurodevelopmental disorders in the era of whole‐genome sequencing–unveiling the dark matter. (2015). Journal of child psychology and psychiatry56(3), 278-295.
  6. Wray NR, Gratten J. Sizing up whole-genome sequencing studies of common diseases. (2018). Nature genetics50(5), 635.
  7. Welter D, MacArthur J, Morales J, Burdett T, Hall P, Junkins H, et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. (2013). Nucleic acids research42(D1), D1001-D1006.
  8. Saunders CJ, Miller NA, Soden SE, Dinwiddie DL, Noll A, Alnadi NA, et al. Rapid whole-genome sequencing for genetic disease diagnosis in neonatal intensive care units. (2012). Science translational medicine4(154), 154ra135-154ra135.
  9. Weinhold N, Jacobsen A, Schultz N, Sander C, Lee W. Genome-wide analysis of noncoding regulatory mutations in cancer. (2014). Nature genetics46(11), 1160.
  10. Sabo PJ, Kuehn MS, Thurman R, Johnson BE, Johnson EM, Cao H, et al. Genome-scale mapping of DNase I sensitivity in vivo using tiling DNA microarrays. (2006). Nature methods3(7), 511.
  11. Song L, Crawford GE. DNase-seq: a high-resolution technique for mapping active gene regulatory elements across the genome from mammalian cells. Cold Spring Harbor Protocols2010(2), pdb-prot5384.
  12. Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein-DNA interactions. (2007). Science316(5830), 1497-1502.
  13. Milne TA, Zhao K, Hess JL. Chromatin immunoprecipitation (ChIP) for analysis of histone modifications and chromatin-associated proteins. (2009). In Leukemia (pp. 409-423). Humana Press.
  14. Angelini C, Costa V. Understanding gene regulatory mechanisms by integrating ChIP-seq and RNA-seq data: statistical solutions to biological problems. (2014). Frontiers in cell and developmental biology2, 51.
  15. Buenrostro JD, Wu B, Chang HY, Greenleaf WJ. ATAC‐seq: a method for assaying chromatin accessibility genome‐(2015). Current protocols in molecular biology109(1), 21-29.
  16. Lagarde J, Uszczynska-Ratajczak B, Carbonell S, Perez-Lluch S, Abad A, Davis C, et al. High-throughput annotation of full-length long noncoding RNAs with capture long-read sequencing. (2017). Nature genetics49(12), 1731.

The rest of this content is restricted - login or subscribe free to access

DTR Issue 3 2021 MiniMagThank you for visiting our website. To access this content in full you'll need to login. It's completely free to subscribe, and in less than a minute you can continue reading. If you've already subscribed, great - just login.

Why subscribe? Join our growing community of thousands of industry professionals and gain access to:

  • quarterly issues in print and/or digital format
  • case studies, whitepapers, webinars and industry-leading content
  • breaking news and features
  • our extensive online archive of thousands of articles and years of past issues
  • ...And it's all free!

Click here to Subscribe today Login here


Related topics