Human chromosome 14 fully sequenced



The French contribution to the sequencing of the human genome came to a conclusion on January 1st, 2003 with the publication online of the complete sequence of human chromosome 14 in the scientific journal Nature. The exact order of the 87,410,661 nucleotides of the long arm of this chromosome was mainly determined at Genoscope-the French National Sequencing Center-after a 5-year effort. Chromosome 14 is the largest human chromosome for which the finished sequence has been published, and the first to be published with no residual "gaps", which makes it the longest uninterrupted DNA molecule to be sequenced to date.

On June 26 2000, the White House announced the completion of the first draft of the human genome. For the International Consortium responsible for the "Human Genome Project," this was only a first step. In contrast to the genome draft produced by its competitor, Celera Genomics, the Consortium's draft was subjected to refinement and "finishing." Concretely, this means placing the sequenced fragments from the draft in order and in the proper orientation, and filling in the "gaps." At the time of the White House announcement, the draft covered 90% of the human genome, but the finishing work was not a simple formality; in fact it was the most tedious part of the sequencing effort. It is, however, of great importance, because only with a high quality sequence (with 99.99% precision, or one error per 10,000 nucleotides) is it possible to identify human genes without ambiguity and to comprehend their role in causing disorders.

The first sequence of a human chromosome without gaps

Since the announcement in June 2000, the partners of the Consortium have been working on the finishing of the chromosome or chromosome region that they had sequenced. Genoscope, the only French partner in the Consortium, has been working since 1998 on chromosome 14 (the human genome consists of 24 types of chromosomes, numbered more or less according to size). As logic would dictate, the first chromosomes to be "finished" have been the smallest ones; almost complete sequences of chromosomes 21 and 22 were published even before the announcement of the draft, and that of chromosome 20 was published in 2001. Thus chromosome 14 is the fourth human chromosome to be published. With its 87 million nucleotides, it is clearly larger than the three preceding chromosomes. Above all, it is the first to be published with no gaps, in the form of a complete uninterrupted sequence. Only the short arm, the centromere and the telomeres are missing; these chromosome regions are rich in repeated sequences which are very difficult to sequence and practically devoid of genes.

Genoscope is well in advance of the deadline, because the sequence of chromosome 14 has been available in public databases since the summer of 2002, although the announcement of the "finished" sequence of the complete human genome is projected for April 2003. This is partly due to the sequencing strategy adopted by Genoscope. As in the other centers of the consortium, the sequence is reconstituted from "reads" of large fragments of genomic DNA, and not from the totality of the genome, as was done at Celera. The difference between Genoscope and the other centers of the Consortium lies in the mapping phase: these centers first ordered large DNA fragments on a physical map of the genome before sequencing them individually ("map first, sequence later") whereas Genoscope selected large overlapping fragments and constructed the map of chromosome 14 as the sequencing progressed ("map as you go"). This strategy made it possible to choose fragments with a minimal overlap and therefore reduce redundancy, which translates into economies of time and cost.

The sequencing itself only represents part of Genoscope's work. Scientists of the center have also attempted to delimit the genes along the chromosome 14 sequence. This operation, which is known as annotation, is very delicate because human genes, like those of the majority of animals, are fragmented. The sequences which correspond to instructions, called exons, represent less than 3% of the genome, and are separated within the gene itself by sequences with no significance called introns.

A harvest of new genes

For the annotation of chromosome 14, Genoscope scientists developed their own procedure, using their expertise in comparative genomics. In fact, the majority of human genes have counterparts in other vertebrates, and comparison with other vertebrate genomes can reveal the structure of these genes, because the sequences of exons are better conserved in the course of evolution than those of the introns and the rest of the genome. This phenomenon is exploited by Exofish, an informatics tool developed at Genoscope to find exons. By the means of Exofish, the sequences of genomes of the mouse and the pufferfish Tetraodon nigroviridis, which was sequenced at Genoscope, have served in the annotation of human chromosome 14.

To the 506 genes already known on chromosome 14, Genoscope has added 344 other genes, either validated or putative. Furthermore, two regions of crucial importance for the immune system have been characterized. In this list of genes, 60 are known to be implicated in genetic diseases, among which one gene implicated in a early onset form of Alzheimer's disease. Most of these genes were localized in recent years by different research teams at the cost of years of fastidious genetic work. The job of identifying such genes will now be greatly facilitated by the complete sequence, a facility which has been the major motivation for the "Human Genome Project" from the start. The identification of six genes implicated in genetic diseases, including one form of familial spastic paraplegia, has thus directly benefited from the complete sequence of chromosome 14, and dozens of others should follow.

R. Heilig et al., Nature 421 (6th February 2003), 601-607.




Scientific contacts : Jean Weissenbach (33 (0) 1 60 87 25 02) - Roland Heilig (33 (0) 1 60 87 25 58) - Gabor Gyapay (33 (0) 1 60 87 25 47)




left down corner right down corner