Nanopore Sequencing and Assembly of a Human Genome With Ultra-long Reads

. 2018 Apr;36(4):338-345.

doi: ten.1038/nbt.4060. Epub 2022 Jan 29.

Nanopore sequencing and assembly of a homo genome with ultra-long reads

Sergey Koren 2 , Karen H Miga i , Josh Quick 3 , Arthur C Rand one , Thomas A Sasani 4 5 , John R Tyson half-dozen , Andrew D Beggs 7 , Alexander T Dilthey 2 , Ian T Fiddes i , Sunir Malla 8 , Hannah Marriott 8 , Tom Nieto 7 , Justin O'Grady 9 , Hugh E Olsen ane , Brent Southward Pedersen 4 5 , Arang Rhie 2 , Hollian Richardson 9 , Aaron R Quinlan 4 5 10 , Terrance P Snutch 6 , Louise Tee vii , Benedict Paten one , Adam M Phillippy 2 , Jared T Simpson xi 12 , Nicholas J Loman 3 , Matthew Loose 8

Affiliations

  • PMID: 29431738
  • PMCID: PMC5889714
  • DOI: ten.1038/nbt.4060

Free PMC article

Nanopore sequencing and assembly of a human genome with ultra-long reads

Miten Jain  et al. Nat Biotechnol. 2018 Apr .

Costless PMC article

Abstruse

We study the sequencing and assembly of a reference genome for the homo GM12878 Utah/Ceph cell line using the MinION (Oxford Nanopore Technologies) nanopore sequencer. 91.2 Gb of sequence information, representing ∼xxx× theoretical coverage, were produced. Reference-based alignment enabled detection of large structural variants and epigenetic modifications. De novo assembly of nanopore reads solitary yielded a contiguous associates (NG50 ∼3 Mb). We adult a protocol to generate ultra-long reads (N50 > 100 kb, read lengths up to 882 kb). Incorporating an additional v× coverage of these ultra-long reads more than doubled the assembly contiguity (NG50 ∼6.4 Mb). The final assembled genome was 2,867 million bases in size, covering 85.8% of the reference. Assembly accuracy, later incorporating complementary short-read sequencing data, exceeded 99.8%. Ultra-long reads enabled associates and phasing of the four-Mb major histocompatibility complex (MHC) locus in its entirety, measurement of telomere echo length, and closure of gaps in the reference human genome assembly GRCh38.

Disharmonize of interest argument

M.L., Northward.L., J.O.G., J.T.S., J.R.T., and T.P.S. were members of the MinION access program (MAP) and have received free-of-accuse flow cells and kits for nanopore sequencing for this and other studies, and travel and accommodation expenses to speak at Oxford Nanopore Technologies conferences. Northward.J.L. has received an honorarium to speak at an Oxford Nanopore company coming together. S.Thousand., A.T.D., J.Q., and T.A.South. accept received travel and accommodation expenses to speak at Oxford Nanopore Technologies conferences. J.T.S., J.O.Thousand., and M.L. receive enquiry funding from Oxford Nanopore Technologies.

Figures

Figure 1
Effigy ane. Summary of data set.

(a) Read length N50s by menstruation jail cell, colored by sequencing eye. Cells: DNA extracted straight from jail cell civilization. DNA: pre-extracted DNA purchased from Coriell. UoB, Univ. Birmingham; UEA, Univ. East Anglia; UoN, Univ. Nottingham; UBC, Univ. British Columbia; UCSC, Univ. California, Santa Cruz. (b) Full yield per flow cell grouped as in a. (c) Coverage (blackness line) of GRCh38 reference compared to a Poisson distribution. The depth of coverage of each reference position was tabulated using samtools depth and compared with a Poisson distribution with lambda = 27.4 (dashed cherry-red line). (d) Alignment identity compared to alignment length. No length bias was observed, with long alignments having the same identity equally short ones. (e) Correlation between 5-mer counts in reads compared to expected counts in the chromosome xx reference. (f) Chromosome xx homopolymer length versus median homopolymer base-phone call length measured from private Illumina and nanopore reads (Scrappie and Metrichor). Metrichor fails to produce homopolymer runs longer than ∼5 bp. Scrappie shows better correlation for longer homopolymer runs, but tends to overcall short homopolymers (between 5 and 15 bp) and undercall long homopolymers (>xv bp). Plot noise for longer homopolymers is due to fewer samples available at that length.

Figure 2
Figure 2. Structural variation and SNP genotyping.

(a) Structural variant genotyping sensitivity using Oxford Nanopore Technologies (ONT) reads. Genotypes (GTs) were inferred for a prepare of 2,414 SVs using both Oxford Nanopore and Platinum Genomes (Illumina) alignments. Using alignments randomly subsampled to a given sequencing depth (north = iii), sensitivity was calculated as the proportion of ONT-derived genotypes that were concordant with Illumina-derived genotypes. (b) Defoliation matrix for genotype-calling evaluation. Each jail cell contains the number of 1000 Genome sites for a particular nanopolish/platinum genotype combination.

Figure 3
Figure 3. Methylation detection using betoken-based methods.

(a) SignalAlign methylation probabilities compared to bisulfite sequencing frequencies at all chosen sites. (b) Nanopolish methylation frequencies compared to bisulfite sequencing at all chosen sites. (c) SignalAlign methylation probabilities compared to bisulfite sequencing frequencies at sites covered past at to the lowest degree x reads in the nanopore and bisulfite data sets; reads were not filtered for quality. (d) Nanopolish methylation frequencies compared to bisulfite sequencing at sites covered past at least ten reads in the nanopore and bisulfite data sets. A minimum log-likelihood threshold of 2.five was practical to remove ambiguous reads. Northward = sample size, r = Pearson correlation coefficient.

Figure 4
Figure 4. Repeat modeling and assembly.

(a) A model of expected NG50 contig size when correctly resolving human repeats of a given length and identity. The y axis shows the expected NG50 contig size when repeats of a certain length (10 axis) or sequence identity (colored lines) can exist consistently resolved. Nanopore assembly contiguity (GM12878 20×, 30×, 35×) is currently limited by low coverage of long reads and a high error rate, making repeat resolution difficult. These assemblies approximately follow the predicted assembly contiguity. The projected assembly contiguity using 30 × of ultra-long reads (GM12878 30× ultra) exceeds thirty Mbp. A recent assembly of 65 × PacBio P6 data with an NG50 of 26 Mbp is shown for comparing (CHM1 P6). (b) Yield by read length (log10) for ligation, rapid and ultra-long rapid library preparations. (c) Chromosomes plot illustrating the contiguity of the nanopore assembly boosted with ultra-long reads. Contig and alignment boundaries, not cytogenetic bands, are represented by a color switch, and so regions of continuous color signal regions of contiguous sequence. White areas indicate unmapped sequence, usually caused by Northward's in the reference genome. Regions of interest, including the 12 50+ kb gaps in GRCh38 closed by our assembly also as the MHC (16 Mbp), are outlined in crimson.

Figure 5
Effigy 5. Ultra-long reads, assembly, and telomeres.

(a) A 16-Mbp ultra-long read contig and associated haplotigs are shown spanning the full MHC region. MHC Course I and II regions are annotated along with various HLA genes. Below this contig, the MHC region is enlarged, showing haplotype A and B coverage tracks for the phased nanopore reads. Nanopore reads were aligned back to the polished Canu contig, with colored lines indicating a high fraction of unmarried-nucleotide discrepancies in the read pileups (as displayed by the IGV browser). The many disagreements indicate the contig is a mosaic of both haplotypes. The haplotig A and B tracks bear witness the effect of assembling each haplotype read set independently. Beneath this, the MHC class 2 region is enlarged, with haplotype A and B raw reads aligned to their respective, unpolished haplotigs. The few consensus disagreements between raw reads and haplotigs indicate successful sectionalization of the reads into haplotypes. (b) An unresolved, l-kb bridged scaffold gap on Xq24 remains in the GRCh38 assembly (adjacent to scaffolds AC008162.3 and AL670379.17, shown in green). This gap spans a ∼four.six-kb tandem repeat containing cancer/testis gene family 47 (CT47). This gap is closed by assembly (contig: tig00002632) and has eight tandem copies of the repeat, validated by alignment of 100 kb+ ultra-long reads likewise containing viii copies of the repeat (low-cal blue with read name identifiers). 1 read has only half-dozen repeats, suggesting the tandem repeated units are variable between homologous chromosomes. (c) Ultra-long reads tin predict telomere length. Two 100 kb+ reads that map to the subtelomeric region of the chromosome 21 q-arm, each containing 4.9–9.1 kb of the telomeric (TTAGGG_ repeat). (d) Telomere length estimates showing variable lengths between non-homologous chromosomes.

Like manufactures

  • De novo genome assembly of a Han Chinese male and genome-broad detection of structural variants using Oxford Nanopore sequencing.

    Cai R, Dong Y, Fang K, Guo C, Ma X. Cai R, et al. Mol Genet Genomics. 2022 Jul;295(4):871-876. doi: x.1007/s00438-020-01672-y. Epub 2022 Apr 9. Mol Genet Genomics. 2020. PMID: 32274588

  • Genome assembly using Nanopore-guided long and error-gratuitous Dna reads.

    Madoui MA, Engelen South, Cruaud C, Belser C, Bertrand L, Alberti A, Lemainque A, Wincker P, Aury JM. Madoui MA, et al. BMC Genomics. 2022 Apr 20;16(1):327. doi: 10.1186/s12864-015-1519-z. BMC Genomics. 2015. PMID: 25927464 Complimentary PMC commodity.

  • Nanopore sequencing and the Shasta toolkit enable efficient de novo associates of xi human being genomes.

    Shafin K, Pesout T, Lorig-Roach R, Haukness Thou, Olsen HE, Bosworth C, Armstrong J, Tigyi K, Maurer Northward, Koren S, Sedlazeck FJ, Marschall T, Mayes Due south, Costa V, Zook JM, Liu KJ, Kilburn D, Sorensen M, Munson KM, Vollger MR, Monlong J, Garrison E, Eichler EE, Salama Due south, Haussler D, Light-green RE, Akeson M, Phillippy A, Miga KH, Carnevali P, Jain M, Paten B. Shafin K, et al. Nat Biotechnol. 2022 Sep;38(9):1044-1053. doi: x.1038/s41587-020-0503-half-dozen. Epub 2022 May 4. Nat Biotechnol. 2020. PMID: 32686750 Free PMC commodity.

  • Oxford Nanopore MinION Sequencing and Genome Assembly.

    Lu H, Giordano F, Ning Z. Lu H, et al. Genomics Proteomics Bioinformatics. 2022 Oct;14(five):265-279. doi: 10.1016/j.gpb.2016.05.004. Epub 2022 Sep 17. Genomics Proteomics Bioinformatics. 2016. PMID: 27646134 Free PMC article. Review.

  • Long-read sequencing in deciphering human genetics to a greater depth.

    Midha MK, Wu Yard, Chiu KP. Midha MK, et al. Hum Genet. 2022 Dec;138(11-12):1201-1215. doi: 10.1007/s00439-019-02064-y. Epub 2022 Sep xix. Hum Genet. 2019. PMID: 31538236 Review.

Cited by 513 articles

  • PacBio long-read amplicon sequencing enables scalable high-resolution population allele typing of the complex CYP2D6 locus.

    Charnaud Southward, Munro JE, Semenec L, Mazhari R, Brewster J, Bourke C, Ruybal-Pesántez S, James R, Lautu-Gumal D, Karunajeewa H, Mueller I, Bahlo M. Charnaud S, et al. Commun Biol. 2022 Feb 25;five(one):168. doi: 10.1038/s42003-022-03102-eight. Commun Biol. 2022. PMID: 35217695

  • Genomic variations and epigenomic landscape of the Medaka Inbred Kiyosu-Karlsruhe (MIKK) panel.

    Leger A, Brettell I, Monahan J, Barton C, Wolf Due north, Kusminski N, Herder C, Aadepu N, Becker C, Gierten J, Hammouda OT, Hasel E, Lischik C, Lust K, Sokolova Due north, Suzuki R, Tavhelidse T, Thumberger T, Tsingos E, Watson P, Welz B, Naruse K, Loosli F, Wittbrodt J, Birney Eastward, Fitzgerald T. Leger A, et al. Genome Biol. 2022 Feb 21;23(1):58. doi: 10.1186/s13059-022-02602-4. Genome Biol. 2022. PMID: 35189951 Free PMC commodity.

  • Genetic variation at mouse and man ribosomal DNA influences associated epigenetic states.

    Rodriguez-Algarra F, Seaborne RAE, Danson AF, Yildizoglu S, Yoshikawa H, Law PP, Ahmad Z, Maudsley VA, Mash A, Holmes N, Ochôa Yard, Hodgkinson A, Marzi SJ, Pradeepa MM, Loose Thousand, Holland ML, Rakyan VK. Rodriguez-Algarra F, et al. Genome Biol. 2022 Feb xiv;23(i):54. doi: 10.1186/s13059-022-02617-10. Genome Biol. 2022. PMID: 35164830 Free PMC article.

  • Long-read sequencing of the zebrafish genome reorganizes genomic architecture.

    Chernyavskaya Y, Zhang X, Liu J, Blackburn J. Chernyavskaya Y, et al. BMC Genomics. 2022 Feb 10;23(1):116. doi: 10.1186/s12864-022-08349-iii. BMC Genomics. 2022. PMID: 35144548 Gratuitous PMC article.

  • Identifying Counterbalanced Chromosomal Translocations in Human Embryos by Oxford Nanopore Sequencing and Breakpoints Region Assay.

    Pei Z, Deng K, Lei C, Du D, Yu Thousand, Sun X, Xu C, Zhang S. Pei Z, et al. Front end Genet. 2022 Jan 18;12:810900. doi: 10.3389/fgene.2021.810900. eCollection 2021. Front end Genet. 2022. PMID: 35116057 Free PMC commodity.

References

    1. Wheeler DA, et al. The complete genome of an private by massively parallel DNA sequencing. Nature. 2008;452:872–876. doi: x.1038/nature06884. - DOI - PubMed
    1. Bentley DR, et al. Accurate whole man genome sequencing using reversible terminator chemistry. Nature. 2008;456:53–59. doi: 10.1038/nature07517. - DOI - PMC - PubMed
    1. Pushkarev D, Neff NF, Convulse SR. Unmarried-molecule sequencing of an individual human being genome. Nat. Biotechnol. 2009;27:847–850. doi: 10.1038/nbt.1561. - DOI - PMC - PubMed
    1. Rothberg JM, et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature. 2011;475:348–352. doi: 10.1038/nature10242. - DOI - PubMed
    1. Pendleton Thousand, et al. Assembly and diploid architecture of an private homo genome via single-molecule technologies. Nat. Methods. 2015;12:780–786. doi: 10.1038/nmeth.3454. - DOI - PMC - PubMed

Publication types

MeSH terms

Grant support

LinkOut - more than resources

  • Total Text Sources

  • Other Literature Sources

  • Research Materials

  • Miscellaneous

pattonopribution.blogspot.com

Source: https://pubmed.ncbi.nlm.nih.gov/29431738/

0 Response to "Nanopore Sequencing and Assembly of a Human Genome With Ultra-long Reads"

Enregistrer un commentaire

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel