Lien pour acceder au site du CEA
Site Genoscope en langue française Genoscope site in english El sitio Genoscope en español
Home page > Home > General informations > FAQ > Sequencing > Sequencing

All the versions of this article:

Sequencing






  1: What is the DNA sequence?
  2: Why do we sequence DNA?
  3: How do we sequence DNA?
  4: What is the assembly?
  5: Why were Genome Centers created?

As we have just seen, each sequencing manipulation, or read, only produces a sequence of 500 to 1000 bases. It is therefore impossible to read, in one go, the sequence of the immense DNA molecules, called chromosomes, which contain the hereditary information of an organism. Human chromosomes are, for example, several tens to hundreds of millions of nucleotides long. To reconstitute these gigantic sequences a large number or reads must be performed, and a sequence volume produced which is several times larger than the size of the chromosome: these redundant reads make it possible to place the sequences in order based on their overlaps, and ensure the quality of the result of each read.

In practice, one starts by randomly breaking the large DNA molecule to be sequenced in order to obtain sub-fragments several thousand nucleotides long. By reading the ends of a large number of these sub-fragments at random, sequences which partially overlap are obtained.

The comparison of these sequences with each other makes it possible to recognize and align the parts which have been sequenced several times. Thanks to these overlapping sequences a certain number of reads can be assembled to reconstitute longer chains (called contigs), for instance the totality of the sequence of the parent fragment. This assembly operation, which is performed by computer programmes, makes it possible to reconstitute the sequences of several million to several tens of millions of bases.

In genomes like the human genome, it is necessary to operate with redundancy factor of 8 to 10 (8-10X coverage, or depth) to reassemble the sequence of a large DNA fragment. In other terms, to sequence such a fragment, it must be reduced to smaller fragments, and then a sufficient number of reads must be performed such that these reads, placed end-to-end, represent 10 times the length of the sequence of the large fragment. This means that each base in this sequence will be present in 10 reads on average. Even at this level of redundancy, several gaps will remain in the assembly, because the reads result from random sampling: some regons will be represented in more than 10 reads, and others in fewer than 10 reads, and some parts will not be covered at all. These gaps may then be “filled” by a targeted effort.

Another difficulty in the assembly of these “large” genomes is caused by repeated sequences, which exist in more or less identical copies in several parts of the genome. These are particularly abundant in the genomes of mammals, and represent 50% of the genome. For this reason they are “masked” during the assembly phase.

The sequence of the human genome, with its 24 different chromosomes, consists of about 3 billion bases. To determine the complete sequence of human chromosomes at 10X coverage, tens of millions of reads must be performed. However, it is possible to obtain a first draft with a lower level of redundancy. In this case, the reassembled fragments will be rather small. For example, with a level of redundancy of 5X, contigs of about 5000 bases can be obtained for the human genome. The sequence of the genome obtained in this way will therefore be fragmented in several hundred thousands of pieces. List of questions.

Last update on 22 January 2008

© Genoscope - Centre National de Séquençage
2 rue Gaston Crémieux CP5706 91057 Evry cedex
Tél:  (+33) 0 1 60 87 25 00
Fax: (+33) 0 1 60 87 25 14

Home | Overview | Projects | News | Press Panorama | Resources | Contact
Follow-up of the site's activity RSS 2.0 | Site Map | Credits | Copyright