All the versions of this article:
1: What is the public project for sequencing the human genome?
2: Has the human genome been completely sequenced?
3: How many genes do humans have?
4: Why is it so difficult to find the genes in a human genome sequence?
5: Where did the sequenced human DNA come from?
6: Is the human genome “freely available”? If not, who owns it?
7: Why was there a Human Genome Project What is its use?
8: Who were the members of the international consortium What was the role of each of them?
9: What was the French contribution to the Human Genome Project?
10: How much did the Human Genome Project cost?
11: With the end of the Human Genome Project, are the large sequencing centers still useful?
Far from getting shorter, the list of genomes to be sequenced keeps on getting longer. To interpret the sequence of a genome, it is invaluable to compare it with other genomes. The species compared may be closely related, or they may belong to branches which have diverged early during the course of evolution. The knowledge gained with each strategy will not be the same. The farther apart two species are in terms of evolution, the more their genomic sequences will have diverged, which may limit the breadth of the comparison. Nevertheless, the parts which have diverged the least, i.e. the genes, will be more clearly distinguished from the rest of the sequence: these regions which are “conserved” between two genomes will serve as landmarks for identifying the genes. It is therefore instructive to have genomes from a spectrum of species at our disposal, chosen at key points in the tree of evolution.
Take the example of the human genome. The chimpanzee is our closest relative in the animal world, and the sequencing of its genome, which is 99% identical to ours, will provide fascinating information on the genetic changes which took place during the last few million years of the evolution of the human branch. The sequencing of the genome of the mouse, which was finished in 2003, will benefit the ensemble of biomedical research, because this rodent has been an animal model for genetics for a long time. The sequences of other placental mammals will extend the knowledge provided by the mouse genome. It will also be instructive to sequence a representative of the marsupials, which separated early from the rest of the mammals. The genome of the kangaroo may clarify the earliest steps in the history of the mammals, and provide a good compromise species in the search for human genes.
Beyond this, representatives of other branches of the vertebrates will facilitate this research, because in general, the vertebrates have conserved a common gene baggage. The vertebrates sequenced to date or which are in the process of being sequenced include a bird-chicken, and two fish with compact genomes; Genoscope performed half of the sequencing of one of these fish, Tetraodon nigraviridis. In 2000 Genoscope used comparisons between genomic sequences of Tetraodon and humans to estimate the number of human genes at about 30,000, and this genome continues to be useful to perfect the annotation of the human genome. Still further removed, we find the genomes of an ascidian, a marine animal which is a close relative of the vertebrates, that of the worm Caenorhabditis elegans and the fly Drosophila melanogaster. The genomes of very simple multicellular organisms may reveal the changes which accompanied the organization of cells into “cell communities”. Finally, the genome of yeast, a unicellular organism, is useful in the discovery of elements which are common to the ensemble of eukaryotes, living organisms in which the genome is sheltered in a nucleus within the cell, from humans to oak trees to Paramecium. The comprehension of fundamental eukaryotic mechanisms such as condensation, recombination and segregation of chromosomes during cell division is of great importance in the study of certain human diseases.
To the above reasons for undertaking new sequencing programmes, more specific reasons can be added. The sequence of the genome of this or that organism may be important for economic (a microbe importance for the dairy industry, for example) or medical (which group of genes explains the virulence of a bacterium compared to that of a related species?) reasons. It is easy to understand the importance of sequencing the genome of rice, the basis of alimentation of half of humanity, or the genome of the anopheles mosquito which is the vector of malaria which kills over a million persons every year. A number of pathogens—bacterial or eukaryotic—have already been sequenced, and others will soon be done. Finally, the exploration of the bacterial world in its ensemble will occupy sequencing centers for many decades: genomic studies of diverse environments (soil, ocean, waste water treatment plants) which have been going on for several years have revealed a formidable bacterial diversity. We only know about 1% of bacterial species; the others have not been noticed because we have not been able to cultivate them. Bacteria exhibit great metabolic inventivity, and these mysterious species constitute a rich reservoir of genes which may be very important in the domains of industry and the environment. The exploration of the genomes of these bacteria is a task with an amplitude comparable to the Human Genome Project, and for which the large sequencing centers are more than ever necessary.