A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers

A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers

Abstract

Mounting evidence supports the idea that transcriptional patterns serve as more specific identifiers of active enhancers than histone marks; however, the optimal strategy to identify active enhancers both experimentally and computationally has not been determined. Here, we compared 13 genome-wide RNA sequencing (RNA-seq) assays in K562 cells and show that nuclear run-on followed by cap-selection assay (GRO/PRO-cap) has advantages in enhancer RNA detection and active enhancer identification. We also introduce a tool, peak identifier for nascent transcript starts (PINTS), to identify active promoters and enhancers genome wide and pinpoint the precise location of 5′ transcription start sites. Finally, we compiled a comprehensive enhancer candidate compendium based on the detected enhancer RNA (eRNA) transcription start sites (TSSs) available in 120 cell and tissue types, which can be accessed at https://pints.yulab.org. With knowledge of the best available assays and pipelines, this large-scale annotation of candidate enhancers will pave the way for selection and characterization of their functions in a time- and labor-efficient manner.

This is a preview of subscription content

Access options

Subscribe to Journal

Get full journal access for 1 year

99,00 €

only 8,25 € per issue

Tax calculation will be finalised during checkout.

Buy article

Get time limited or full article access on ReadCube.

$32.00

All prices are NET prices.

Data availability

Processed TRE calls are publicly accessible via our web portal (https://pints.yulab.org). Data that support the findings of this study are available within the paper and its Supplementary information files. All sequencing data analyzed in this study were retrieved from public databases (NCBI GEO and ENCODE portal); lists of accessions are available in Supplementary Tables 1 and 4. Source data are provided with this paper.

Code availability

The source code of PINTS is publicly available at https://github.com/hyulab/PINTS; scripts and pipelines used to generate results reported in this study can be retrieved from https://github.com/hyulab/PINTS_analysis.

References

  1. Heintzman, N. D. et al. Distinct and predictive chromatin signatures of transcriptional promoters and enhancers in the human genome. Nat. Genet. 39, 311–318 (2007).

    CAS 

    Google Scholar
     

  2. Calo, E. & Wysocka, J. Modification of enhancer chromatin: what, how, and why? Mol. Cell 49, 825–837 (2013).

    CAS 

    Google Scholar
     

  3. Kim, T.-K. et al. Widespread transcription at neuronal activity-regulated enhancers. Nature 465, 182–187 (2010).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  4. Descostes, N. et al. Tyrosine phosphorylation of RNA polymerase II CTD is associated with antisense promoter transcription and active enhancers in mammalian cells. eLife 3, e02105 (2014).

    PubMed 
    PubMed Central 

    Google Scholar
     

  5. Andersson, R. et al. An atlas of active enhancers across human cell types and tissues. Nature 507, 455–461 (2014).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  6. Tippens, N. D. et al. Transcription imparts architecture, function and logic to enhancer units. Nat. Genet. 52, 1067–1075 (2020).

    PubMed 
    PubMed Central 

    Google Scholar
     

  7. Core, L. J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat. Genet. 46, 1311–1320 (2014).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  8. Tome, J. M., Tippens, N. D. & Lis, J. T. Single-molecule nascent RNA sequencing identifies regulatory domain architecture at promoters and enhancers. Nat. Genet. 50, 1533–1541 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  9. Kruesi, W. S., Core, L. J., Waters, C. T., Lis, J. T. & Meyer, B. J. Condensin controls recruitment of RNA polymerase II to achieve nematode X-chromosome dosage compensation. eLife 2, e00808 (2013).

    PubMed 
    PubMed Central 

    Google Scholar
     

  10. Kwak, H., Fuda, N. J., Core, L. J. & Lis, J. T. Precise maps of RNA polymerase reveal how promoters direct initiation and pausing. Science 339, 950–953 (2013).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  11. Henriques, T. et al. Widespread transcriptional pausing and elongation control at enhancers. Genes Dev. 32, 26–41 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  12. Kodzius, R. et al. CAGE: cap analysis of gene expression. Nat. Methods 3, 211–222 (2006).

    CAS 
    PubMed 

    Google Scholar
     

  13. Batut, P., Dobin, A., Plessy, C., Carninci, P. & Gingeras, T. R. High-fidelity promoter profiling reveals widespread alternative promoter usage and transposon-driven developmental gene expression. Genome Res. 23, 169–180 (2013).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  14. Hirabayashi, S. et al. NET-CAGE characterizes the dynamics and topology of human transcribed cis-regulatory elements. Nat. Genet. 51, 1369–1379 (2019).

    CAS 
    PubMed 

    Google Scholar
     

  15. Duttke, S. H., Chang, M. W., Heinz, S. & Benner, C. Identification and dynamic quantification of regulatory elements using total RNA. Genome Res. 29, 1836–1846 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  16. Policastro, R. A., Raborn, R. T., Brendel, V. P. & Zentner, G. E. Simple and efficient profiling of transcription initiation and transcript levels with STRIPE-seq. Genome Res. 30, 910–923 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  17. Core, L. J., Waterfall, J. J. & Lis, J. T. Nascent RNA sequencing reveals widespread pausing and divergent initiation at human promoters. Science 322, 1845–1848 (2008).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  18. Nojima, T. et al. Mammalian NET-seq reveals genome-wide nascent transcription coupled to RNA processing. Cell 161, 526–540 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  19. Paulsen, M. T. et al. Coordinated regulation of synthesis and stability of RNA during the acute TNF-induced proinflammatory response. Proc. Natl Acad. Sci. USA 110, 2240–2245 (2013).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  20. Magnuson, B. et al. Identifying transcription start sites and active enhancer elements using BruUV-seq. Sci. Rep. 5, 17978 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  21. Chen, H. et al. A pan-cancer analysis of enhancer expression in nearly 9000 patient samples. Cell 173, 386–399 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  22. Zhang, Z. et al. Transcriptional landscape and clinical utility of enhancer RNAs for eRNA-targeted therapy in cancer. Nat. Commun. 10, 4562 (2019).

    PubMed 
    PubMed Central 

    Google Scholar
     

  23. Azofeifa, J. G. & Dowell, R. D. A generative model for the behavior of RNA polymerase. Bioinformatics 33, 227–234 (2017).

    CAS 
    PubMed 

    Google Scholar
     

  24. Danko, C. G. et al. Identification of active transcriptional regulatory elements from GRO-seq data. Nat. Methods 12, 433–438 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  25. Wang, Z., Chu, T., Choate, L. A. & Danko, C. G. Identification of regulatory elements from nascent transcription using dREG. Genome Res. 29, 293–303 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  26. Chu, T. et al. Chromatin run-on and sequencing maps the transcriptional regulatory landscape of glioblastoma multiforme. Nat. Genet. 50, 1553–1564 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  27. Adiconis, X. et al. Comprehensive comparative analysis of 5′-end RNA-sequencing methods. Nat. Methods 15, 505–511 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  28. Frith, M. C. et al. A code for transcription initiation in mammalian genomes. Genome Res. 18, 1–12 (2008).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  29. Thakore, P. I. et al. Highly specific epigenome editing by CRISPR-Cas9 repressors for silencing of distal regulatory elements. Nat. Methods 12, 1143–1149 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  30. Fulco, C. P. et al. Systematic mapping of functional enhancer-promoter connections with CRISPR interference. Science 354, 769–773 (2016).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  31. Wakabayashi, A. et al. Insight into GATA1 transcriptional activity through interrogation of cis elements disrupted in human erythroid disorders. Proc. Natl Acad. Sci. USA 113, 4434–4439 (2016).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  32. Klann, T. S. et al. CRISPR-Cas9 epigenome editing enables high-throughput screening for functional regulatory elements in the human genome. Nat. Biotechnol. 35, 561–568 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  33. Xie, S., Duan, J., Li, B., Zhou, P. & Hon, G. C. Multiplexed engineering and analysis of combinatorial enhancer activity in single cells. Mol. Cell 66, 285–299 (2017).

    CAS 

    Google Scholar
     

  34. Gasperini, M. et al. A genome-wide framework for mapping gene regulation via cellular genetic screens. Cell 176, 377–390 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  35. Fulco, C. P. et al. Activity-by-contact model of enhancer-promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  36. Xie, S., Armendariz, D., Zhou, P., Duan, J. & Hon, G. C. Global analysis of enhancer targets reveals convergent enhancer-driven regulatory modules. Cell Rep. 29, 2570–2578 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  37. Schraivogel, D. et al. Targeted Perturb-seq enables genome-scale genetic screens in single cells. Nat. Methods 17, 629–635 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  38. Kheradpour, P. et al. Systematic dissection of regulatory motifs in 2000 predicted human enhancers using a massively parallel reporter assay. Genome Res. 23, 800–811 (2013).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  39. Kwasnieski, J. C., Fiore, C., Chaudhari, H. G. & Cohen, B. A. High-throughput functional testing of ENCODE segmentation predictions. Genome Res. 24, 1595–1602 (2014).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  40. Ulirsch, J. C. et al. Systematic functional dissection of common genetic variation affecting red blood cell traits. Cell 165, 1530–1545 (2016).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  41. Ernst, J. et al. Genome-scale high-resolution mapping of activating and repressive nucleotides in regulatory regions. Nat. Biotechnol. 34, 1180–1190 (2016).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  42. Maricque, B. B., Chaudhari, H. G. & Cohen, B. A. A massively parallel reporter assay dissects the influence of chromatin structure on cis-regulatory activity. Nat. Biotechnol. 37, 90–95 (2019).

    CAS 

    Google Scholar
     

  43. Rathert, P. et al. Transcriptional plasticity promotes primary and acquired resistance to BET inhibition. Nature 525, 543–547 (2015).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  44. Dao, L. T. M. et al. Genome-wide characterization of mammalian promoters with distal enhancer functions. Nat. Genet. 49, 1073–1081 (2017).

    CAS 
    PubMed 

    Google Scholar
     

  45. Lee, D. et al. STARRPeaker: uniform processing and accurate identification of STARR-seq active regions. Genome Biol. 21, 298 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  46. Wang, X. et al. High-resolution genome-wide functional dissection of transcriptional regulatory regions and nucleotides in human. Nat. Commun. 9, 5380 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  47. Schwalb, B. et al. TT-seq maps the human transient transcriptome. Science 352, 1225–1228 (2016).

    CAS 

    Google Scholar
     

  48. Core, L. J. et al. Defining the status of RNA polymerase at promoters. Cell Rep. 2, 1025–1035 (2012).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  49. Mchaourab, Z. F., Perreault, A. A. & Venters, B. J. ChIP-seq and ChIP-exo profiling of Pol II, H2A.Z, and H3K4me3 in human K562 cells. Sci. Data 5, 180030 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  50. Harrow, J. et al. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22, 1760–1774 (2012).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  51. O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).

    PubMed 

    Google Scholar
     

  52. Jurka, J. Repbase update: a database and an electronic journal of repetitive elements. Trends Genet. 16, 418–420 (2000).

    CAS 
    PubMed 

    Google Scholar
     

  53. Djebali, S. et al. Landscape of transcription in human cells. Nature 489, 101–108 (2012).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  54. Field, A. & Adelman, K. Evaluating enhancer function and transcription. Annu. Rev. Biochem. 89, 213–234 (2020).

    CAS 
    PubMed 

    Google Scholar
     

  55. Andersson, R. & Sandelin, A. Determinants of enhancer and promoter activities of regulatory elements. Nat. Rev. Genet. 21, 71–87 (2020).

    CAS 
    PubMed 

    Google Scholar
     

  56. Palazzo, A. F. & Koonin, E. V. Functional long non-coding RNAs evolve from junk transcripts. Cell 183, 1151–1161 (2020).

    CAS 
    PubMed 

    Google Scholar
     

  57. ENCODE Project Consortium et al. Expanded encyclopaedias of DNA elements in the human and mouse genomes. Nature 583, 699–710 (2020).

  58. Wang, D. et al. Reprogramming transcription by distinct classes of enhancers functionally defined by eRNA. Nature 474, 390–394 (2011).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  59. Chae, M., Danko, C. G. & Kraus, W. L. groHMM: a computational tool for identifying unannotated and cell type-specific transcription units from global run-on sequencing data. BMC Bioinformatics 16, 222 (2015).

    PubMed 
    PubMed Central 

    Google Scholar
     

  60. Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).

    PubMed 
    PubMed Central 

    Google Scholar
     

  61. Schneider, V. A. et al. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res. 27, 849–864 (2017).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  62. Roadmap Epigenomics Consortium et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

  63. Pennacchio, L. A., Bickmore, W., Dean, A., Nobrega, M. A. & Bejerano, G. Enhancers: five essential questions. Nat. Rev. Genet. 14, 288–295 (2013).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  64. Vo Ngoc, L., Huang, C. Y., Cassidy, C. J., Medrano, C. & Kadonaga, J. T. Identification of the human DPR core promoter element using machine learning. Nature 585, 459–463 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  65. Fornes, O. et al. JASPAR 2020: update of the open-access database of transcription factor binding profiles. Nucleic Acids Res. 48, D87–D92 (2020).

    CAS 
    PubMed 

    Google Scholar
     

  66. Landrum, M. J. et al. ClinVar: improvements to accessing data. Nucleic Acids Res. 48, D835–D844 (2020).

    CAS 
    PubMed 

    Google Scholar
     

  67. Vahrenkamp, J. M. et al. FFPEcap-seq: a method for sequencing capped RNAs in formalin-fixed paraffin-embedded samples. Genome Res. 29, 1826–1835 (2019).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  68. Yao, L., Wang, H., Song, Y. & Sui, G. BioQueue: a novel pipeline framework to accelerate bioinformatics analysis. Bioinformatics 33, 3286–3288 (2017).

    CAS 
    PubMed 

    Google Scholar
     

  69. Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).

    PubMed 
    PubMed Central 

    Google Scholar
     

  70. Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

    CAS 

    Google Scholar
     

  71. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  72. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    PubMed 
    PubMed Central 

    Google Scholar
     

  73. Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).

    PubMed 
    PubMed Central 

    Google Scholar
     

  74. Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  75. Virtanen, P. et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17, 261–272 (2020).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  76. Seabold, S. & Perktold, J. Statsmodels: econometric and statistical modeling with Python. https://doi.org/10.25080/majora-92bf1922-011 (2010).

  77. Dale, R. K., Pedersen, B. S. & Quinlan, A. R. Pybedtools: a flexible Python library for manipulating genomic datasets and annotations. Bioinformatics 27, 3423–3424 (2011).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  78. Cock, P. J. A. et al. Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics 25, 1422–1423 (2009).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  79. Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  80. Sherry, S. T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  81. Preker, P. et al. RNA exosome depletion reveals transcription upstream of active human promoters. Science 322, 1851–1854 (2008).

    CAS 
    PubMed 

    Google Scholar
     

  82. van Arensbergen, J. et al. Genome-wide mapping of autonomous promoter activity in human cells. Nat. Biotechnol. 35, 145–153 (2017).

    PubMed 

    Google Scholar
     

  83. Shivram, H. & Iyer, V. R. Identification and removal of sequencing artifacts produced by mispriming during reverse transcription in multiple RNA-seq technologies. RNA 24, 1266–1274 (2018).

    CAS 
    PubMed 
    PubMed Central 

    Google Scholar
     

  84. Bedi, K., Paulsen, M. T., Wilson, T. E. & Ljungman, M. Characterization of novel primary miRNA transcription units in human cells using Bru-seq nascent RNA sequencing. NAR Genom. Bioinform. 2, lqz014 (2020).

    PubMed 

    Google Scholar
     

  85. Zacher, B. et al. Accurate promoter and enhancer identification in 127 ENCODE and roadmap epigenomics cell types and tissues by GenoSTAN. PLoS ONE 12, e0169249 (2017).

    PubMed 
    PubMed Central 

    Google Scholar
     

Download references

Acknowledgements

Computation was performed on a cluster administered by the Biotechnology Resource Center at Cornell University. We thank members of the Yu and Lis laboratories and the ENCODE Consortium (specifically A. Mortazavi, M. Ljungman and J. E. Moore) for helpful discussions and guidance; and H. Zhu for her suggestions on concept visualization. This work was supported by grants from the National Institutes of Health (no. UM1HG009393 to J.T.L. and H.Y. and nos. R01DK115398, R01DK127778 and R01HD082568 to H.Y.). L.Y. was supported by the Cornell Presidential Life Sciences Fellowship.

Author information

Affiliations

  1. Department of Computational Biology, Cornell University, Ithaca, NY, USA

    Li Yao, Alden King-Yung Leung, John T. Lis & Haiyuan Yu

  2. Weill Institute for Cell and Molecular Biology, Cornell University, Ithaca, NY, USA

    Li Yao, Jin Liang, Alden King-Yung Leung & Haiyuan Yu

  3. Department of Molecular Biology and Genetics, Cornell University, Ithaca, NY, USA

    Abdullah Ozer & John T. Lis

Contributions

Conceptualization was performed by L.Y., J.T.L. and H.Y. Methodology was carried out by L.Y. Software was the responsibility of L.Y. L.Y. carried out formal analysis. J.L. performed investigations. Data curation was carried out by L.Y., J.L. and A.K.-Y.L. L.Y. and J.L. wrote the original draft. Writing, review and editing were performed by J.L., A.O., J.T.L. and H.Y. Visualization was the responsibility of L.Y., J.L., A.O. and H.Y. J.T.L. and H.Y. supervised the study.

Corresponding authors

Correspondence to
John T. Lis or Haiyuan Yu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Biotechnology thanks Leng Han and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 An extended evaluation of eRNA detection sensitivity of different assays.

a and c are the extended versions for Fig. 2a,b, respectively. a and b show the capability of different assays to capture previously identified enhancers. The color of stacked bars indicates the detection of eRNAs originated from either one or both strands of the enhancer loci. The transparency level shows the number of reads for an enhancer locus to be considered as covered. The top track in a is derived from the CRISPR or CRISPRi based reference set (n = 803), the bottom track is derived from consensus loci validated by STARR-seq and MPRA (n = 550). b, Sensitivity evaluated in the other cell line, GM12878, with orientation-independent enhancers identified from previous studies (n = 3,544)6,46. c, Differences in read coverage among stable (n = 13,861) and unstable (n = 6,380) transcripts. The error bars in the top track show the extrema of effect sizes (n = 5,000). The center dots, box limits, and whiskers in the bottom track of c denote the median, upper and lower quartiles, and 1.5× interquartile range, respectively.

Source data

Extended Data Fig. 2 Effect of technical artifacts on eRNA capture.

a, A new strategy for evaluating strand specificity without the interference from promoter-upstream transcripts (PROMPTs)81. Red and blue colors indicate reads’ mapping direction; the highlighted (yellow) region indicates a previously validated82 PROMPT. Only the first exon in green was used for evaluation. b, Strand specificities of three stranded and unstranded RNA-seq libraries with our strategy. The p-value was estimated by a two-sided t test; c, Strand specificity for all libraries evaluated with our strategy. Values and error bars represent the mean and SD. n = 2 (GRO-cap, CoPRO, csRNA-seq, PRO-seq, GRO-seq, mNET-seq), n = 3 (STRIPE-seq), n = 4 (CAGE and RAMPAGE), n = 8 (BruUV-seq, total RNA-seq), n = 9 (Bru-seq). d, Distribution of 3-mers at flush end sites83 for RIP-seq and TGIRT-seq. The dashed red lines stand for the frequency of RT3-mers (sequence identical to the last three nts for the RT primer [for RIP-seq] or the 3′ adapter [for TGIRT-seq]) in the genome. e, Log odds ratios (LORs) of observed RT3-mer at flushing end sites versus in the genome (top) and internal priming rates (bottom) of assays when the internal priming could be detected from the sequencing data. f, The overlap between enhancers in the RppH library (Capped+Uncapped as ‘C + U’) that are also covered in the Capped library (C). The x-axis shows the minimum number of reads required for an enhancer locus to be considered as covered. g, Difference of log-transformed read counts between the capped (C) and RppH (C + U) libraries. The effect size was measured by Cohen’s d. In the box plot, the center dots, box limits, and whiskers denote the median, upper and lower quartiles, and 1.5× interquartile range, respectively. h, Pearson’s r of log-transformed reads from promoters of expressed transcripts (TPM > 5) was quantified using PRO-seq and POLR2A ChIP-exo. n = 4,747 (low), n = 9,058 (medium), and n = 2,470 (high).

Source data

Extended Data Fig. 3 Analyses of factors affecting assays’ sensitivity in detecting eRNAs.

a is the extended version for Fig. 3a. b, An example shows that divergent transcripts detected by NT-assays can originate from two overlapping genes (MMP23B and SLC35E2B) instead of from a regulatory element. Sequencing reads were RPM-normalized. c, Proportion of mappable reads from different assays originated from various abundant RNA families. d, Effects of rRNA depletion in eRNA enrichment. For each category, three downsampled libraries were included. BruUV-seq libraries from a previously published study84 were used for this analysis. The p-value for rRNA percentage was calculated by two proportions z test (two-sided, p-value: 0); the p-value for true enhancer coverage was calculated by McNemar’s test (two-sided, p-value: 2.1 × 10−25). Values and error bars represent the mean and SD. e, The distribution of sequencing reads (in RPM) around GENCODE-annotated splicing junction sites. The shaded area indicates the 95% confidence interval of mean values estimated via bootstrap.

Source data

Extended Data Fig. 4 Extended evaluations of assays’ specificity.

a, Epigenomic and transcription factor binding profiles for the enhancer and non-enhancer sets. For H3K27ac and CTCF, the profiles are presented as fold-changes over control; for DHS, the profile is shown as normalized sequencing depth. Solid lines represent mean densities, and shades depict the 95% confidence interval of mean values estimated via bootstrap. KE: known enhancers; NE: non-enhancers. b Signal-to-noise ratios evaluated in K562. n = 803 for known enhancers, n = 6,777 for non-enhancers. c, Signal-to-noise ratios evaluated in GM12878. n = 3,544 (Known enhancers), and n = 153,809 (Non-enhancers). For b and c, 10,000 bootstrapped samples were used for calculating the fold enrichment (FE). The center dots, box limits, and whiskers in b and c denote the median, upper and lower quartiles, and 1.5× interquartile range, respectively. d, False discovery rates estimated by the overlap between the top 5,000, 10,000, 20,000, and 100,000 genomic bins and the true and non-enhancer sets. Downsampled libraries were used (n = 3); values and error bars represent the mean and SD.

Source data

Extended Data Fig. 5 Assessments of transcript unit prediction and schematic illustration of PINTS.

a, The consistencies vary greatly between transcription units annotated in GENCODE (Annot.) and those predicted by different tools58,59,85 (Pred.). Lines in the violin plot indicate the 25th, 50th, and 75th quartiles, respectively. b, Schematic plot of PINTS. i, Improvement of TSS identification resolution by focusing only on read ends and using zero-inflated Poisson (ZIP) models to fit local background to address the substantially increased sparsity of signals. The thin grey lines indicate sequencing reads with the 5′ ends highlighted in red. ii, The existence of other potential true peaks (pink) elevates the estimation of read density in the local background. iii, A schematic plot shows how IQR-ZIP works. The blue box shows the read density distribution of the local background; the purple dot shows the density of the peak to be tested; the pink dot shows the density of a potential true peak close to the peak to be tested, whose read density is a clear outlier and thus excluded from local background estimation.

Source data

Extended Data Fig. 6 Profiles of peak calls generated by different peak callers for various assays.

a, Aggregated profiles of epigenomic marks, transcription binding sites, and chromatin accessibility in true enhancer regions and distal TREs identified by different peak callers for TSS- and NT-assays. The shaded area indicates the 95% confidence interval of mean values estimated via bootstrap; b, An example demonstrating why MACS2 is not suitable for identifying TREs. c, Distribution of element sizes identified from 12 assays by all applicable peak callers. In the box plot, the center lines, box limits, and whiskers denote the median, upper and lower quartiles, and 1.5× interquartile range, respectively; points show observations that are not in the range of quartiles ±1.5 × (Q3Q1). A table of sample sizes is available in Supplementary Table 5.

Extended Data Fig. 7 Extended analyses on the robustness of element predictions.

a, A previous study showed that the sequences between hg19 and hg38 are very similar as hg38 has 0.09% more ungapped non-centromeric sequences than hg19, only 0.17% of ungapped hg19 sequences are not in hg3861. Here we show the distribution of sequencing reads in the genome. The read counts of each assay were summarized against their frequency in a log scale with hg19 as blue lines and hg38 as orange lines. The p-values were calculated by two-sided Student’s t tests. b, Robustness (Jaccard index) of different peak callers when applying them to experimental data with technical and biological replicates. Correlations between alignments (Sample cor.) were calculated as Pearson’s r of log-transformed read counts among genomic bins (500 bp).

Source data

Extended Data Fig. 8 Performance evaluation of peak callers under different sequencing depths.

a, Epigenomic patterns of the true positive (enhancers, promoters) and true negative (non-enhancers) sets used for ROC calculation for peak calling from GRO-cap. b~d, Sensitivity and specificity of different peak callers when analyzing TSS-libraries (n=7) downsampled to 18.9 (b), 15 (c), and 10 (d) million mappable reads. The corresponding shaded areas show the 95% confidence interval of the means (via bootstrap). For tools where ROCs cannot be calculated, solid dots represent their performance with default parameters. Values and error bars show mean and SD.

Source data

Extended Data Fig. 9 Profiles of unique distal elements identified by different tools.

a, Comparison of the epigenomic signals (fold change over control) in elements uniquely identified by PINTS and other tools. b, Enrichment (measured as log odds ratios) of TF-binding motifs in PINTS unique TREs compared to other tools. The circles indicate the corresponding p-values (−log2 p, two-sided z tests), and the error bars indicate the 90% confidence interval.

Source data

Extended Data Fig. 10 A summary of the computational tools compared in this study.

The features of different algorithms are summarized and grouped by their roles in the peak calling procedure (colored blocks). Features utilized by each tool to call peaks from nascent transcript sequencing data are indicated.

Supplementary information

Supplementary Tables 1–5.

Supplementary Table 1: Summaries of sequencing libraries analyzed in this study. Supplementary Table 2: Known enhancer sets. Supplementary Table 3: Non-enhancer set. Supplementary Table 4: Datasets integrated in the PINTS web server. Supplementary Table 5: Sample size for TREs and each tool predicted in different assays.

About this article

Cite this article

Yao, L., Liang, J., Ozer, A. et al. A comparison of experimental assays and analytical methods for genome-wide identification of active enhancers.
Nat Biotechnol (2022). https://doi.org/10.1038/s41587-022-01211-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41587-022-01211-7

Read More

Author: admin