Plant genomes are often characterized by a high level of repetitiveness and polyploid nature. Consequently, creating genome assemblies for plant genomes is challenging. The introduction of short-reads technologies 10 years ago significantly increased the number of available plant genomes. Generally, these assemblies are incomplete and fragmented, and only a few are at the chromosome-scale.
Recently, Pacific Biosciences and Oxford Nanopore sequencing technologies were commercialized that can sequence long DNA fragments (kilobases to megabases order) and, using efficient algorithms, provide high-quality assemblies in terms of contiguity and completeness of repetitive regions. However, even though genome assemblies based on long reads exhibit high contig N50s (>1 Mb), these methods are still insufficient to decipher genome organization at the chromosome-level.
Here we describe a strategy based on long reads (MinION and PromethION sequencers) and optical maps (Saphyr system) that can produce chromosome-level assemblies, and demonstrate its applicability by generating high-quality genome sequences for two new dicotyledon cultivars (Brassica rapa
Z1 and Brassica oleracea
HDEM) and one new monocotyledon (Musa schizocarpa
). All three assemblies show contig N50s > 5 Mb and contain scaffolds that represent entire chromosomes or chromosome arms.
In addition, we report the new long-reads assembly of Darmor-bzh genome (Brassica napus
) generated by combining long-reads sequencing data, optical and genetic maps. Using the PromethION device and six flowcells, we generated about 16M long-reads representing 93X coverage and more importantly 6X with reads longer than 100Kb. This ultralong-reads dataset allows us to generate one of the most contiguous and complete assembly of a Brassica
genome to date (contigs N50 > 10Mb).