Differential abundance testing on single-cell data using k-nearest neighbor graphs

Differential abundance testing on single-cell data using k-nearest neighbor graphs

Abstract

Current computational workflows for relative analyses of single-cell datasets usually utilize discrete clusters as input when screening for differential abundance amongst speculative conditions. Clusters do not constantly supply the proper resolution and can not catch constant trajectories. Here we provide Milo, a scalable analytical structure that carries out differential abundance screening by appointing cells to partly overlapping areas on a k– closest next-door neighbor chart. Utilizing simulations and single-cell RNA sequencing (scRNA-seq) information, we reveal that Milo can determine perturbations that are obscured by discretizing cells into clusters, that it keeps incorrect discovery rate control throughout batch results which it outshines alternative differential abundance screening techniques. Milo recognizes the decrease of a fate-biased epithelial precursor in the aging mouse thymus and determines perturbations to several family trees in human cirrhotic liver. As Milo is based upon a cell– cell resemblance structure, it may likewise apply to single-cell information aside from scRNA-seq. Milo is supplied as an open-source R software application plan at https://github.com/MarioniLab/miloR

Access alternatives

Subscribe to Journal

Get complete journal gain access to for 1 year

99,00 EUR

just 8,25 EUR per concern

Tax computation will be settled throughout checkout.

Rent or Buy short article

Get time minimal or complete post gain access to on ReadCube.

from$ 8.99

All costs are NET costs.

Code schedule

Milo is carried out as an open-source plan in R ( https://github.com/MarioniLab/miloR) and is installable from Bioconductor (≥ 3.13; http://www.bioconductor.org/packages/release/bioc/html/miloR.html). Code utilized to produce figures and carry out analyses can be discovered at https://github.com/MarioniLab/milo_analysis_2020

References

  1. 1.

    Kiselev, V. Y., Andrews, T. S. & Hemberg, M. Challenges in without supervision clustering of single-cell RNA-seq information. Nat. Rev. Genet.20, 273–282(2019).

    CAS
    PubMed

    Google Scholar

  2. 2.

    Ramachandran, P. et al. Handling the fibrotic specific niche of human liver cirrhosis at single-cell level. Nature575, 512–518(2019).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  3. 3.

    Baran-Gale, J. et al. Aging compromises mouse thymus function and remodels epithelial cell distinction. eLife 9, e56221(2020).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  4. 4.

    Pijuan-Sala, B. et al. A single-cell molecular map of mouse gastrulation and early organogenesis. Nature566, 490–495(2019).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  5. 5.

    Haber, A. L. et al. A single-cell study of the little digestive epithelium. Nature551, 333–339(2017).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  6. 6.

    Lun, A. T. L., Richard, A. C. & Marioni, J. C. Testing for differential abundance in mass cytometry information. Nat. Techniques14, 707–709(2017).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  7. 7.

    Zhao, J. et al. Detection of differentially plentiful cell subpopulations discriminates biological states in scRNA-seq information. Proc. Natl Acad. Sci. U.S.A.118, e2100293118(2021).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  8. 8.

    Burkhardt, D. B. et al. Measuring the impact of speculative perturbations at single-cell resolution. Nat. Biotechnol.39, 619–629(2021).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  9. 9.

    Gut, G., Tadmor, M. D., Pe’er, D., Pelkmans, L. & Liberali, P. Trajectories of cell-cycle development from repaired cell populations. Nat. Approaches12, 951–954(2015).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  10. 10

    McCarthy, D. J., Chen, Y. & Smyth, G. K. Differential expression analysis of multifactor RNA-seq explores regard to biological variation. Nucleic Acids Res.40, 4288–4297(2012).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  11. 11

    Robinson, M. D., McCarthy, D. J. & Smyth, G. K. lawn edger: a Bioconductor plan for differential expression analysis of digital gene expression information. Bioinformatics26, 139–140(2010).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  12. 12

    Robinson, M. D. & Oshlack, A. A scaling normalization technique for differential expression analysis of RNA-seq information. Genome Biol.11, R25(2010).

    PubMed
    PubMed Central

    Google Scholar

  13. 13

    Benjamini, Y. & Hochberg, Y. Multiple hypotheses checking with weights. Scand. J. Statist.24, 407–418(1997).


    Google Scholar

  14. 14

    Soneson, C. & Robinson, M. D. Bias, toughness and scalability in single-cell differential expression analysis. Nat. Techniques15, 255–261(2018).

    CAS
    PubMed

    Google Scholar

  15. 15

    Cannoodt, R., Saelens, W., Deconinck, L. & Saeys, Y. Spearheading future omics analyses utilizing dyngen, a multi-modal simulator of single cells. Nat. Communications12, 1– 9 (2021).


    Google Scholar

  16. 16

    Luecken, M. et al. Benchmarking atlas-level information combination in single-cell genomics. Preprint at https://www.biorxiv.org/content/101101/20200522111161 v2(2020).

  17. 17

    Tran, H. T. N. et al. A criteria of batch-effect correction approaches for single-cell RNA sequencing information. Genome Biol.21, 12 (2020).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  18. 18

    Chazarra-Gil, R., van Dongen, S., Kiselev, V. Y. & Hemberg, M. Flexible contrast of batch correction approaches for single-cell RNA-seq utilizing BatchBench. Nucleic Acids Res.49, e42(2021).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  19. 19

    Haghverdi, L., Lun, A. T. L., Morgan, M. D. & Marioni, J. C. Batch impacts in single-cell RNA-sequencing information are fixed by matching shared nearby next-door neighbors. Nat. Biotechnol.36, 421–427(2018).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  20. 20

    Stoeckius, M. et al. Cell Hashing with barcoded antibodies allows multiplexing and doublet detection for single cell genomics. Genome Biol.19, 224 (2018).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  21. 21

    McGinnis, C. S. et al. MULTI-seq: sample multiplexing for single-cell RNA sequencing utilizing lipid-tagged indices. Nat. Techniques16, 619–626(2019).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  22. 22

    Akiyama, T. et al. The growth necrosis aspect household receptors RANK and CD40 cooperatively develop the thymic medullary microenvironment and self-tolerance. Immunity29, 423–437(2008).

    CAS
    PubMed

    Google Scholar

  23. 23

    Hikosaka, Y. et al. The cytokine RANKL produced by favorably chosen thymocytes promotes medullary thymic epithelial cells that reveal autoimmune regulator. Immunity29, 438–450(2008).

    CAS
    PubMed

    Google Scholar

  24. 24

    Wilkinson, A. L., Qurashi, M. & Shetty, S. The function of sinusoidal endothelial cells in the axis of swelling and cancer within the liver. Front. Physiol.11, 990 (2020).

    PubMed
    PubMed Central

    Google Scholar

  25. 25

    Foldi, I. et al. Lectin-complement path particles are reduced in clients with cirrhosis and make up the threat of bacterial infections. Liver Int. 37, 1023–1031(2017).

    CAS
    PubMed

    Google Scholar

  26. 26

    Ganesan, L. P. et al. FcγRIIb on liver sinusoidal endothelium clears little immune complexes. J. Immunol.189, 4981–4988(2012).

    CAS
    PubMed

    Google Scholar

  27. 27

    Sato, K. et al. Ductular response in liver illness: pathological systems and translational significances: liver injury and regrowth. Hepatology69, 420–430(2019).

    PubMed

    Google Scholar

  28. 28

    Morell, C. M., Fabris, L. & Strazzabosco, M. Vascular biology of the biliary epithelium: biliary epithelium vascular biology. J. Gastroenterol. Hepatol.28, 26–32(2013).

    PubMed
    PubMed Central

    Google Scholar

  29. 29

    Mariotti, V., Fiorotto, R., Cadamuro, M., Fabris, L. & Strazzabosco, M. New insights on the function of vascular endothelial development consider biliary pathophysiology. JHEP Rep. 3, 100251 (2021).

    PubMed
    PubMed Central

    Google Scholar

  30. 30

    R Core Team. R: A Language and Environment for Statistical Computing. https://www.R-project.org(R Foundation for Statistical Computing, 2017).

  31. 31

    Büttner, M., Ostner, J., Müller, C. l., Theis, F. J. & Schubert, B. scCODA: a Bayesian design for compositional single-cell information analysis. Preprint at https://www.biorxiv.org/content/101101/20201214422688 v2(2020).

  32. 32

    Dixit, A. et al. Perturb-seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled hereditary screens. Cell167, 1853–1866(2016).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  33. 33

    Datlinger, P. et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat. Approaches14, 297–301(2017).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  34. 34

    Jaitin, D. A. et al. Dissecting immune circuits by connecting CRISPR-pooled screens with single-cell RNA-seq. Cell167, 1883–1896(2016).

    CAS
    PubMed

    Google Scholar

  35. 35

    Stoeckius, M. et al. Synchronised epitope and transcriptome measurement in single cells. Nat. Techniques14, 865–868(2017).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  36. 36

    Cao, J. et al. Joint profiling of chromatin availability and gene expression in countless single cells. Science361, 1380–1385(2018).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  37. 37

    Chen, S., Lake, B. B. & Zhang, K. High-throughput sequencing of the transcriptome and chromatin availability in the exact same cell. Nat. Biotechnol.37, 1452–1457(2019).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  38. 38

    Zhu, C. et al. An ultra high-throughput technique for single-cell joint analysis of open chromatin and transcriptome. Nat. Struct. Mol. Biol.26, 1063–1070(2019).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  39. 39

    Ma, S. et al. Chromatin possible recognized by shared single-cell profiling of RNA and chromatin. Cell183, 1103–1116(2020).

    CAS
    PubMed

    Google Scholar

  40. 40

    Luecken, M. D. & Theis, F. J. Current finest practices in single-cell RNA-seq analysis: a tutorial. Mol. Syst. Biol.15, e8746(2019).

    PubMed
    PubMed Central

    Google Scholar

  41. 41

    Setty, M. et al. Wishbone determines bifurcating developmental trajectories from single-cell information. Nat. Biotechnol.34, 637–645(2016).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  42. 42

    Griffiths, J. & Lun, A. MouseGastrulationData: single-cell transcriptomics information throughout mouse gastrulation and early organogenesis. https://github.com/MarioniLab/MouseGastrulationData(2021).

  43. 43

    Csardi, G. & Nepusz, T. The igraph software application bundle for complicated network research study. InterJournal http://www.interjournal.org/manuscript_abstract.php?361100992(2006).

  44. 44

    Huber, W. et al. Managing high-throughput genomic analysis with Bioconductor. Nat. Approaches12, 115–121(2015).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  45. 45

    Gentleman, R. C. et al. Bioconductor: open software application advancement for computational biology and bioinformatics. Genome Biol. 5, R80(2004).

    PubMed
    PubMed Central

    Google Scholar

  46. 46

    Ritchie, M. E. et al. limma powers differential expression analyses for RNA-sequencing and microarray research studies. Nucleic Acids Res.43, e47(2015).

    PubMed
    PubMed Central

    Google Scholar

  47. 47

    Kuleshov, M. V. et al. Enrichr: a thorough gene set enrichment analysis web server 2016 upgrade. Nucleic Acids Res.44, W90– W97(2016).

    CAS
    PubMed
    PubMed Central

    Google Scholar

  48. 48

    Lun, A. T. L., McCarthy, D. J. & Marioni, J. C. A detailed workflow for low-level analysis of single-cell RNA-seq information with Bioconductor. F1000 Res. 5, 2122 (2016).

    PubMed
    PubMed Central

    Google Scholar

  49. 49

    Yu, G., Wang, L.-G., Han, Y. & He, Q.-Y. clusterProfiler: an R plan for comparing biological styles amongst gene clusters. OMICS16, 284–287(2012).

    CAS
    PubMed
    PubMed Central

    Google Scholar

Download referrals

Acknowledgements

We thank S. Ghazanfar for feedback on the technique; N. Kumasaka for discuss the manuscript; C. Suo, V. Kedlian, R. Elmentaite, J. P. Pett, K. Tuong and B. Stewart for feedback on the software application bundle; and D. Burkhardt, M. Luecken and W. Lewis for conversations on benchmarking. J.C.M. acknowledges core financing from the European Molecular Biology Laboratory and core financing from Cancer Research UK (C9545/ A29580), which supports M.D.M. E.D. and S.A.T. acknowledge Wellcome Sanger core financing (WT206194). N.C.H. is supported by a Wellcome Trust Senior Research Fellowship in Clinical Science (ref. 219542/ Z/19/ Z), the Medical Research Council and a Chan Zuckerberg Initiative Seed Network Grant.

Author info

Affiliations

  1. Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge, UK

    Emma Dann, Sarah A. Teichmann & John C. Marioni

  2. Centre for Inflammation Research, The Queen’s Medical Research Institute, University of Edinburgh, Edinburgh, UK

    Neil C. Henderson

  3. MRC Human Genetics Unit, Institute of Genetics and Molecular Medicine, University of Edinburgh, Edinburgh, UK

    Neil C. Henderson

  4. Theory of Condensed Matter Group, The Cavendish Laboratory, University of Cambridge, Cambridge, UK

    Sarah A. Teichmann

  5. European Molecular Biology Laboratory European Bioinformatics Institute, Hinxton, Cambridge, UK

    Michael D. Morgan & John C. Marioni

  6. Cancer Research UK Cambridge Institute, Li Ka Shing Centre, University of Cambridge, Cambridge, UK

    Michael D. Morgan & John C. Marioni

Contributions

E.D., M.D.M. and J.C.M. developed the approach concept. E.D. and M.D.M. established the technique, composed the code and carried out analyses. E.D., M.D.M., S.A.T. and N.C.H. translated the outcomes. E.D., M.D.M., S.A.T., N.C.H. and J.C.M. composed and authorized the manuscript. M.D.M. and J.C.M. managed the job.

Corresponding authors

Correspondence to.
Michael D. Morgan or John C. Marioni

Ethics statements

Competing interests

In the last 3 years, S.A.T. has actually gotten compensation for consulting and Scientific Advisory Board subscription from Genentech, Roche, Biogen, ForesiteLabs and Qiagen. All other authors have no contending interests to state.

Additional info

Peer evaluation details Nature Biotechnology thanks Dana Pe’er, Michael Love and the other, confidential, customer( s) for their contribution to the peer evaluation of this work.

Publisher’s note Springer Nature stays neutral with regard to jurisdictional claims in released maps and institutional associations.

Extended information

Extended Data Fig. 1 Benchmarking DA approaches on simulated information.

DA analysis efficiency on KNN charts from simulated datasets of various geographies: ( a) discrete clusters (2700 cells, 3 populations); ( b) 1-D direct trajectory (7500 cells, 7 populations); ( c) Branching trajectory (7500 cells, 10 populations). Boxplots reveal the mean with interquartile varieties (25–75%); hairs reach the biggest worth no even more than 1.5 x the interquartile variety from the range from package, with outlier information points revealed beyond this variety.

Extended Data Fig. 2 Sensitivity of DA techniques to low fold modification in abundance.

( a) True favorable rate (TPR, top) and incorrect favorable rate (FPR, bottom) of DA approaches computed on cells in various bins of P( C1) utilized to produce condition labels (bin size = 0.05, the number on the x-axis suggests the lower worth in the bin). The outcomes for 36 simulations on 2 representative populations (colors) are revealed. The filled points show the mean of each P( C1) bin. ( b) Variability in Milo power is described by the portion of real favorable cells near to the DA limit for meaning of ground reality. Example circulations of P( C1) for cells discovered as real positives (TP) or incorrect negatives (FN) by Milo. Examples for simulations on 2 populations (rows) and 3 simulated fold modifications (columns) are revealed. ( c, d) True Positive Rate (TPR) of DA detection for simulated DA areas of increasing size centred at the very same centroid (Erythroid2 ( c) and Caudal neuroectoderm ( d)). Outcomes for 3 condition simulations per population and fold modification are revealed.

Extended Data Fig. 3 Comparison of Milo and MELD for abundance fold modification estimate.

( a d) Scatter-plots of the real fold modification at the area index versus the fold modification approximated by Milo (A, C) and MELD (B, D), without batch impact ( a, b) and with batch result (magnitude = 0.5) ( c, d), where LFC = log( pc’/( 1 – pc’)). The areas overlapping real DA cells (pc’ higher than the 75%quantile of P( C1) in the mouse gastrulation dataset) are highlighted in red. ( e, f) Mean Squared Error (MSE) contrast for MELD and Milo for real unfavorable area ( e) and real favorable areas ( f), with increasing simulated log-Fold Change and magnitude of batch result. Each boxplot sums up the outcomes for n =-LRB- simulations. Box plots reveal the typical with interquartile varieties (25–75%); hairs encompass the biggest worth no even more than 1.5 x the interquartile variety from the range from package, with outlier information points revealed beyond this variety.

Extended Data Fig. 4 Controlling for batch impacts in differential abundance analysis.

( a) In silico batch correction improves the efficiency of DA techniques in the existence of batch impacts: contrast of efficiency of DA techniques without any batch result, with batch impacts of increasing magnitude fixed with MNN, and uncorrected batch results. Each boxplot sums up arise from simulations on n= 9 populations. ( b) True Positive Rate (TPR, left) and False Discovery Rate (FDR, right) for healing of cells in simulated DA areas for DA populations with increasing batch impact magnitude on the mouse gastrulation dataset. For each boxplot, arises from 8 populations and 3 condition simulations per population are revealed ( n =-LRB- simulations). Each panel represents a various DA technique and a various simulated log-Fold Change. ( c) Comparison of Milo efficiency with(~ batch condition) or without(~ condition) accounting for the simulated batch in the NB-GLM. For each boxplot, arises from 8 populations, simulated fold modification > 1.5 and 3 condition simulations per population and fold modification are revealed (72 simulations per boxplot). In all panels, boxplots reveal the mean with interquartile varieties (25–75%); hairs reach the biggest worth no even more than 1.5 x the interquartile variety from the range from package, with outlier information points revealed beyond this variety.

Supplementary details

About this short article

Cite this short article

Dann, E., Henderson, N.C., Teichmann, S.A. et al. Differential abundance screening on single-cell information utilizing k– nearby next-door neighbor charts.
Nat Biotechnol(2021). https://doi.org/101038/ s41587 -021-01033- z

Download citation

Read More

Author: admin

Leave a Reply

Your email address will not be published.