########################################################################## ########### ################ ########### FAQ OF ONCORHYNCHUS MYKISS GENOME ################ ########### ################ ########################################################################## ####### ASSEMBLY SECTION 1/ READ SEQUENCE ASSEMBLY 2/ CONSTRUCTION OF CHROMOSOME SEQUENCES 3/ MEANING OF Ns IN ASSEMBLY SEQUENCES ####### ####### ASSEMBLY SECTION # # 1/ READ SEQUENCE ASSEMBLY # Sanger BAC ends, Roche 454 and Illumina HiSeq reads are assembled. This procedure gave 2 kinds of sequences: * scaffolds that are composed of at least two contigs or only one contig of at least 2000 bp. Assembly size= 1877544617 numberOfSequences= 79941 minSize= 1955 maxSize= 5466130 averageSize= 23487 * remaining contigs that are not scaffolded and that are composed of at least 500bp and at most 1999 bp. Assembly size= 226802420 numberOfSequences= 223457 minSize= 500 maxSize= 1999 averageSize= 1015 All of these sequences are used to build chromosome sequences. In total: Assembly size= 2104347037 numberOfSequences= 303398 minSize= 500 maxSize= 5466130 averageSize= 6936 # # 2/ CONSTRUCTION OF CHROMOSOME SEQUENCES # To construct chromosome sequences we mapped on the 303398 assembly sequences different information that gave us chromosome identification and a more or less precise location on them: * genetic map * physical map * RADTag map We obtained 3 kinds of results : * chr_XX are composed of ordered and oriented assembled sequences * chrUn_XX are composed of "globally" ordered but not oriented assembled sequences. For example, the order of two adjacent sequences might be ambiguous, but the global order of both relatively to the previous and next sequence is not ambiguous. At the end of these chromosomes there are some sequences that are absolutely not ordered because RADTAG map information that were used were composed of too much colocalized markers, so we added those sequences at the end separated by a gap of 200 Ns. * chrUn is composed of sequences for which we have no chromosome anchoring information. Finally the assembly is composed of 2 134 686 837 bp : * 1 877 544 617 coming from scaffolds, * 210 4347 037 from un-scaffolded contigs, * 30 339 800 from supplementary gaps between assembly sequences. # # 3/ MEANING OF Ns IN ASSEMBLY SEQUENCES # During the procedures of assembly and chromosomes construction, some stretches of Ns are added. * In contigs and scaffolds Ns indicate gaps, i.e the number of N between DNA fragment are estimated thanks to sequencing libraries. * In chromosome sequence, gaps have a fixed size of 100 or 200 Ns. They separate contigs and scaffolds. Those gaps have no biological meaning. We could not use genetic or physical map to estimate gap size in base pair. We added 303369 gaps, so 30339800 Ns to construct 58 chromosomes.