Lien pour acceder au site du CEA
Site Genoscope en langue française Genoscope site in english El sitio Genoscope en español
Home page > Home > General informations > FAQ > The Human Genome Project > The Human Genome Project

All the versions of this article:

The Human Genome Project






  1: What is the public project for sequencing the human genome?
  2: Has the human genome been completely sequenced?
  3: How many genes do humans have?
  4: Why is it so difficult to find the genes in a human genome sequence?
  5: Where did the sequenced human DNA come from?
  6: Is the human genome “freely available”? If not, who owns it?
  7: Why was there a Human Genome Project What is its use?
  8: Who were the members of the international consortium What was the role of each of them?
  9: What was the French contribution to the Human Genome Project?
  10: How much did the Human Genome Project cost?
  11: With the end of the Human Genome Project, are the large sequencing centers still useful?


  At the beginning of the 1990s the international scientific community laid the groundwork for a project that, because of its importance, was dubbed the “Apollo Project for Biology.” Its objective was to obtain the complete sequence of the human genome—3.2 billion nucleotides, or, in a writing analogy, the contents of 2000 books of 500 pages each—for the beginning of the third millenium. Because of the size of this genome, the large sequencing centers joined to form an international consortium to share the task. Each of the 20 institutions of the “public” consortium (financed by public funds or foundations) sequenced specific chromosomes or chromosomal regions, of the 24 human chromosomes (see the list of the members of the consortium and their contributions). Each center agreed to deposit the sequence data in public databases as soon as it was produced.

The first years of the Human Genome Project were devoted to mapping: the establishment of physical maps (covering each chromosome with an ensemble of large genomic fragments arranged in order based on their overlaps) and linkage maps (an ensemble of markers whose relative positions on the chromosomes had been determined). The actual large-scale sequencing efforts didn’t begin until 1998.

The end of the Human Genome Project was initially planned for 2005, but progress in sequencing technology during the 1990s as well as renewed financing from the sponsoring institutions made it possible to finish before this date: a first draft of the sequence of the human genome was celebrated in June 2000 at the White House, and the finishing work was completed in April 2003, two years ahead of time. A complete and 99.99% precise version of the human genome sequence is freely accessible on-line today, available to scientists all over the world. The identification of human genes is continuing, but most of them have already been located on the sequence, and characterized.

The Human Genome Project included additional objectives which were also achieved ahead of time. This included a catalog of positions in the “generic” sequence of the human genome which vary from one individual to another (more than 4 million of these have already been recorded) and the production of a good-quality sequence of the genome of the mouse: the knowledge of the genome of this mammal which has been used as an animal model in genetics for almost a century, is of great importance for the interpretation of the human genome sequence.


  1: What is the public project for sequencing the human genome?
  2: Has the human genome been completely sequenced?
  3: How many genes do humans have?
  4: Why is it so difficult to find the genes in a human genome sequence?
  5: Where did the sequenced human DNA come from?
  6: Is the human genome “freely available”? If not, who owns it?
  7: Why was there a Human Genome Project What is its use?
  8: Who were the members of the international consortium What was the role of each of them?
  9: What was the French contribution to the Human Genome Project?
  10: How much did the Human Genome Project cost?
  11: With the end of the Human Genome Project, are the large sequencing centers still useful?


  The human genome sequence which is accessible in databases today is as complete as current techniques permit. It corresponds essentially to the “sequencable” portion of the genome, that which contains the quasi-totality of the genes. This fraction of the genome, which is called the euchromatin, represents 2.9 billion nucleotides, or 90% of the 3.2 billion nucleotides of the ensemble of the human genome. It has been sequenced to 99% completion (the remaining 1% corresponds to several hundred gaps which scientists have been unable to fill).The portion of the genome which is not included in the Human Genome Project is called heterochromatin. This region contains highly repetitive DNA sequences—it is very monotonous and contains practically no genes. Heterochromatin is found notably at the chromosomal structures called centromeres, as well as at the extremities of the chromosomes, called telomeres. It is extremely difficult to sequence this very repetitive DNA using current techniques, which explains why it was neglected at first. However, these regions can be targeted for study, since they play an important role in chromosome function, and a few genes may be concealed in them.


  1: What is the public project for sequencing the human genome?
  2: Has the human genome been completely sequenced?
  3: How many genes do humans have?
  4: Why is it so difficult to find the genes in a human genome sequence?
  5: Where did the sequenced human DNA come from?
  6: Is the human genome “freely available”? If not, who owns it?
  7: Why was there a Human Genome Project What is its use?
  8: Who were the members of the international consortium What was the role of each of them?
  9: What was the French contribution to the Human Genome Project?
  10: How much did the Human Genome Project cost?
  11: With the end of the Human Genome Project, are the large sequencing centers still useful?


  The number of human genes has been the subject of estimations for a long time; both direct and indirect approaches have been used. However, only the availability of a complete good-quality sequence of the human genome has made a systematic search for genes possible. This sequence is available today, and the “annotation” work, i.e. searching for genes and characterizing them, is on the right track. The human gene count today is approximately 25,000. This figure should not change much in the future: although some genes which have already been listed may be suppressed because they represent vestiges of genes which are no longer active, other new genes remain to be discovered and may compensate for this lowering of the number. In 2000, Genoscope scientists were among the first to suggest a total number of human genes of about 30,000, which was then a value well below estimations which were common at the time (more than a hundred thousand human genes according to some) (see the Press Release and the context). There were even informal bets on the number of human genes in 2000, and a Genoscope scientist was one of the three bettors who was closest to the presently-accepted number.


  1: What is the public project for sequencing the human genome?
  2: Has the human genome been completely sequenced?
  3: How many genes do humans have?
  4: Why is it so difficult to find the genes in a human genome sequence?
  5: Where did the sequenced human DNA come from?
  6: Is the human genome “freely available”? If not, who owns it?
  7: Why was there a Human Genome Project What is its use?
  8: Who were the members of the international consortium What was the role of each of them?
  9: What was the French contribution to the Human Genome Project?
  10: How much did the Human Genome Project cost?
  11: With the end of the Human Genome Project, are the large sequencing centers still useful?


  If you feel like going through the three billion letters which constitute the sequence of the human genome, you will have a hard time finding the parts which correspond to the instructions, or genes, in the monotonous succession of A, T, C and G nucleotides. No obvious characteristic makes a gene evident to the naked eye. And there are not many chances that a reading starting at a random location in the genome will lead you to a gene rapidly: in humans as in other mammals, the genes occupy less than 30% of the genomic DNA. Furthermore, the genes are fragmented: in plants and animals the biologically significant part of the genes is divided into blocks called exons, separated by intervening sequences called introns. Moreover, the exons represent less than 3% of the human genome and are not easy to delimit. For example, the 24 exons of the gene which encodes neurexin-3 are separated by very long introns, and are dispersed over about 1.5 million nucleotides on human chromosome 14!

A computer, however, can read the sequence of the human genome much more easily than the human eye. The study of the characteristics statistically associated with genes has led to computer programs for gene-finding, which are useful for large-scale preliminary annotation. These programs may make false predictions about the limits of an exon, however, or may miss an existing exon. They must therefore be complemented by an approach which uses experimental data. This approach consists in looking for similarities between the human genome sequence and various types of sequences, for example sequences of expressed gene products (messenger RNA and proteins), for which a large quantity of data has been collected in humans and other organisms beginning in the 1990s; and also genomic sequences which may come from human or other genomes. In the first example, the sequence of a gene is delimited by alignment with the sequence of its own messenger RNA, or the messenger RNA of a related gene; in the case of comparisons between genomic sequences the genes are identified on the basis of their “coding” parts, which have been more conserved during the course of evolution than the rest of the genome sequence. Genoscope uses conserved regions between the genomes of humans and that of a small fish, Tetraodon nigroviridis to improve the prediction of human genes.


  1: What is the public project for sequencing the human genome?
  2: Has the human genome been completely sequenced?
  3: How many genes do humans have?
  4: Why is it so difficult to find the genes in a human genome sequence?
  5: Where did the sequenced human DNA come from?
  6: Is the human genome “freely available”? If not, who owns it?
  7: Why was there a Human Genome Project What is its use?
  8: Who were the members of the international consortium What was the role of each of them?
  9: What was the French contribution to the Human Genome Project?
  10: How much did the Human Genome Project cost?
  11: With the end of the Human Genome Project, are the large sequencing centers still useful?


  The DNA sequences produced by the Human Genome Project did not come from a single donor, but from several anonymous donors recruited in the United States. The procedure adopted guaranteed that the identity of the volunteers would not be revealed. Recruitment was carried out by posting announcements in the area around two laboratories where the DNA “libraries” were being prepared. The donors, which were of diverse origins, were told about the project and gave their informed consent. Their DNA was extracted from blood cells collected by venepuncture. Many precautions were taken to avoid the possibility of revealing the identity of the donor of the sample. Furthermore, five to ten samples were prepared for each sample utilized, such that no donor could be sure that his/her DNA was part of the material sequenced. (Click on this link for supplementary information)

Even starting with a single donor, a single version of the sequence would not be obtained. This is because each human being has one set of chromosomes from his mother and the other from his father and the sequence of the paternal chromosome differs from the sequence of the homologous chromosome from the mother at certain positions. Under the hypothesis of a single donor, the large DNA fragments selected by the scientists to construct a “map” of the chromosome could come from one or the other of the two homologous chromosomes. The sequence of each large fragment would be homogeneous, of paternal or maternal origin, but differences would appear in the regions where the two large fragments from paternal and maternal chromosomes overlap. Since the sequence of each large fragment is established with a high level of confidence, these divergences can be distinguished from sequencing errors and classified as polymorphisms. This is an advantage of the “clone-by-clone” strategy followed by the consortium. Beginning with several donors, the sequence of large fragments may have come not only from two homologous chromosomes from the same individual, but also from different individuals, and in this way more polymorphisms can be discovered.


  1: What is the public project for sequencing the human genome?
  2: Has the human genome been completely sequenced?
  3: How many genes do humans have?
  4: Why is it so difficult to find the genes in a human genome sequence?
  5: Where did the sequenced human DNA come from?
  6: Is the human genome “freely available”? If not, who owns it?
  7: Why was there a Human Genome Project What is its use?
  8: Who were the members of the international consortium What was the role of each of them?
  9: What was the French contribution to the Human Genome Project?
  10: How much did the Human Genome Project cost?
  11: With the end of the Human Genome Project, are the large sequencing centers still useful?


  The members of the consortium have agreed to deposit the sequences they obtain in public databases without delay. If the sequencing of the human genome had been entrusted to genomic companies, there would have been a great risk of the sequences being sequestered in private databases which could only be consulted by paying a high price. It was mainly in this sense that the public project avoided the “appropriation” of the human genome sequence. In divulging the sequence of a gene, the scientists of the consortium were in fact suppressing the element of novelty necessary for obtaining a patent, thus making patents on the sequence itself impossible. It is still possible, however, to patent an application derived from a knowledge of the sequence. Many are in agreement that free access to the genomic sequence is the best way to stimulate biomedical research, and the industrial competition should come in downstream of the sequence, toward the biological comprehension of the function of the genes in the organism.

Nevertheless, human genes have been patented, even before the beginning of the Human Genome Project. For example, sequencing programs for complementary DNA (copies of messenger RNA from gene expression) in the 1990s led to numerous patent applications from both biotechnology companies and public institutions. Furthermore, patents from the genomic DNA sequencing program have been granted. For example, the Celera Genomics Company took advantage of its human genome sequencing effort to submit patent applications for an indefinite number of human genes.

Not all of these patent applications will be successful. The criteria for the granting of a patent on a DNA sequence have become more severe, both in the United States and in Europe, as the progress in technology has made sequencing a routine process. In order to obtain a patent, the “invention” must fulfill criteria of inventiveness as well as utility (in the United States) or industrial application (in Europe). It has therefore become impossible to patent a “raw” sequence without characterizing the function of the gene and without a non-trivial description of the applications of the sequence, such as diagnostics, gene therapy or the creation of transgenic animal models. Moreover, once the patent is delivered, the content of its claims can be contested on these same points. Finally, it is important to remember that a patent is not title to a gene which is present in all of our bodies—it is above all a means of preventing a competitor from commercializing an application derived from the knowledge of the gene. In any case, the power to prohibit such utilization, when the claims have an abusive content, may have the effect of sterilizing research, especially if it is linked to an exclusive licensing strategy.

No one knows exactly what portion of the genome and its genes can be freely exploited for commercial purposes. As of the end of 2000, the US Patent Office (USPTO) had granted patents on over 6000 DNA sequences, including 1000 from humans, and more than 20 000 patent applications were pending. It remains to be determined how many of these will be granted, how many applicants will continue with their application and how many of these patents will hold up.


  1: What is the public project for sequencing the human genome?
  2: Has the human genome been completely sequenced?
  3: How many genes do humans have?
  4: Why is it so difficult to find the genes in a human genome sequence?
  5: Where did the sequenced human DNA come from?
  6: Is the human genome “freely available”? If not, who owns it?
  7: Why was there a Human Genome Project What is its use?
  8: Who were the members of the international consortium What was the role of each of them?
  9: What was the French contribution to the Human Genome Project?
  10: How much did the Human Genome Project cost?
  11: With the end of the Human Genome Project, are the large sequencing centers still useful?


  Since we learned to read the sequence of DNA in the 1970s, we have dreamed of knowing our own genome. This dream is almost reality today, even though we are not yet capable of understanding all the instructions contained in the genome sequence.

The interpretation of the sequence of the human genome is on the right track today, and numerous applications are expected in the decades to come. The most important advances will be in the domains of medicine and in fundamental research in biology, but the scientific results themselves will be the source of the large majority of new applications. However, these advances will not happen immediately: several years of research will be necessary. On the other hand, this research could not be undertaken without the genome sequence.

The first application of the sequence of the human genome is in the identification of human genes. This was the only way to complete an exhaustive and precise inventory of human genes. During the 1990s, some scientists placed their hopes in the sequencing of messenger RNAs, which are the products of gene expression: they judged that it would be useless and costly to sequence the 3 billion nucleotides of the human genome., of which only 3% correspond to the “coding” part of the genes (See” Why is it so difficult to find the genes in the sequence?”). The results have confirmed that, without the sequence of the genome, the collections of messenger RNA sequences do not lead to a reliable inventory of human genes. Systematic sequencing of the genome has furthermore proved to be more economical in the long run than a study of human genes on a case-by-case basis, which implies redundant efforts. This is what motivated the launching of the Human Genome Project at the beginning of the 1990s.

The inventory of human genes will first help in the identification of the genes implicated in genetic diseases. Genetic studies often lead to the definition of an “interval” on a chromosome in which the causative gene for a disease is found in its mutated form. The inventory of the genes in this interval (obtained by analysis of the sequence) permits selection of those which are most likely to be implicated in the pathology, because of the supposed or known properties of their products, and to begin the research on the best candidates. Before the sequence of the human genome was available, geneticists had to blindly explore intervals of several million nucleotides, looking at hundreds of genes in the interval. Thanks to the finished and “annotated” sequence, these groups can gain up to several years of fastidious work. In the near future, this should lead to the discovery of several thousand genes responsible for genetic diseases.

Knowledge of a gene in which a mutation provokes a genetic disease can lead to the development of a diagnostic test based on the DNA. The identification of the causative gene also makes it possible to understand the physiologic mechanism leading to the appearance of the disease, and in certain cases, to explore novel therapeutic approaches. It was in this way that a promising treatment for Friedreich’s ataxia was developed by a French group at the Necker Hospital in 1999, directly from a knowledge of the gene and its function.

Finally, the human genome sequence, together with the inventory of positions which are variable from one person to another, will facilitate the identification of genetic factors in susceptibility to common diseases. These diseases, such as diabetes or arteriosclerosis, certainly have a genetic component, but a multitude of factors make small contributions to the pathology and interact with environmental factors in a complex way. Thanks to the degree of resolution attained today by genetic studies, we will begin to unravel this knot and comprehend the molecular mechanisms of these diseases and better understand the role of the environment. This could lead to new treatments on one hand, and to more effective preventive measures on the other.


  1: What is the public project for sequencing the human genome?
  2: Has the human genome been completely sequenced?
  3: How many genes do humans have?
  4: Why is it so difficult to find the genes in a human genome sequence?
  5: Where did the sequenced human DNA come from?
  6: Is the human genome “freely available”? If not, who owns it?
  7: Why was there a Human Genome Project What is its use?
  8: Who were the members of the international consortium What was the role of each of them?
  9: What was the French contribution to the Human Genome Project?
  10: How much did the Human Genome Project cost?
  11: With the end of the Human Genome Project, are the large sequencing centers still useful?


  The international consortium for the sequencing of the human genome included 20 sequencing centers in six countries (Germany, China, United States, France, Japan, United Kingdom). Here is the list:

Abbreviation Center
BCM Human Genome Sequencing Center / Baylor College of Medicine, Houston (Texas) ; USA
Beijing Human Genome Center / Beijing Genomics Institute, Académie chinoise des sciences, Beijing ; China
CSHL Lita Annenberg Hazen Genome Center / Cold Spring Harbor Laboratory, Cold Spring Harbor (N.Y.), USA
GBF Gesellschaft fur Biotechnologische Forschung mbH, Braunschweig ; Germany
GS Genoscope, Evry ; France
GTC GTC Sequencing Center / Genome Therapeutics Corp., Waltham (Mass.) ; USA
IMB Department of Genome Analysis / Institute of Molecular Biotechnology, Jena ; Germany
JGI Joint Genome Institute / U.S. Department of Energy, Walnut Creek (Calif.) ; USA
Keio Département de biologie moléculaire / Ecole de médecine de l’université Keio, Tokyo ; Japan
MPIMG Max Planck Institute for Molecular Genetics, Berlin ; Germany
MSC Multimegabase Sequencing Center / The Institute for Systems Biology, Seattle (Wash.) ; USA
RIKEN RIKEN Genomic Sciences Center, Yokohama ; Japan
SC The Wellcome Trust Sanger Institute (Sanger Center), Hinxton ; UK
SGTC Stanford Genome Technology Center, Stanford (Calif.) ; USA
SHGC Stanford Human Genome Center, Stanford (Calif.) ; USA
UOACGT University of Oklahoma / Advanced Center for Genome Technology, Norman (Okla.), USA
UTSW University of Texas / Southwestern Medical Center, Dallas (Tex.) ; USA (this center is no longer active)
UWGC University of Washington Genome Center, Seattle (Wash.) ; USA
WI (now the Broad Institute) Whitehead Institute / MIT Center for Genome Research, Cambridge (Mass.) ; USA
WUGSC Washington University / Genome Sequencing Center, St Louis (Mo.) ; USA

Other institutes and sequencing centers, although there were not officially part of the consortium, also contributed in a substantial manner to the sequencing effort for the human genome. Here are some of the most important contributors:

CGMCenter for Genetics in Medicine (Perkin Elmer/Washinton Univ.) St Louis (Mo.) ; USA (ce centre n’est plus en activité)
JSTJapan Science and Technology Corporation (teams under contract to the Japanese Cancer Research Foundation (JFCR) and to Keio, Kitasato et Tokai Universities, Japan) ; Japan
TIGRThe Institute for Genomic Research, Rockville (Maryland) ; USA
YMGCThe National Yang Ming University Genome Center, Taipei ; Taiwan

Finally, three institutions played a crucial role in the project in terms of bioinformatics:

NCBICenter for Biotechnology Information aux National Institutes of Health, USA
EBIEuropean Bioinformatics Institute, Cambridge, UK
UCSC University of California at Santa Cruz, USA

With its neighbor, the Sanger Institute, the EBI created the Ensembl (e!) project, which performs an automatic search for genes in a sequence of the human genome and permits “navigation” over the length of this “annotated” sequence. UCSC has developed a similar navigator.

The centers which participated in the sequencing of the human genome selected chromosomes or chromosomal regions of different sizes, depending on their capacity. Their respective contributions (measured as percentage of non-redundant finished sequence present in the databases at the beginning of 2003) are shown below:

The contributions of the 6 countries implicated in the project are the following:

CountryContribution
United States 60.8 %
United Kingdom28.9 %
Japan 4.9 %
France 2.8 %
Germany 1.5 %
China 0.7 %

  1: What is the public project for sequencing the human genome?
  2: Has the human genome been completely sequenced?
  3: How many genes do humans have?
  4: Why is it so difficult to find the genes in a human genome sequence?
  5: Where did the sequenced human DNA come from?
  6: Is the human genome “freely available”? If not, who owns it?
  7: Why was there a Human Genome Project What is its use?
  8: Who were the members of the international consortium What was the role of each of them?
  9: What was the French contribution to the Human Genome Project?
  10: How much did the Human Genome Project cost?
  11: With the end of the Human Genome Project, are the large sequencing centers still useful?

Genoscope, the only representative of France in the Consortium, chose to sequence the long arm of human chromosome 14 (the portion of the chromosome which is sequencable, and contains the genes; see above), which amounts to about 3% of the human genome. Chromosome 14 is depicted below as one of the 23 other human chromosomes.

In 2002, this sequencing effort resulted in a continuous sequence of 87 410 661 nucleotides which extends from one end of the sequencable portion of the long arm of chromosome 14 to the other. The results of the analysis of this sequence were published on January 1, 2003 in the journal Nature (see Press Release). This was the first human chromosome sequence to be published with no gaps, and at the moment of its publication, the longest DNA sequence ever determined. To get an idea of the progress accomplished in one decade, the sequencing of the yeast genome, which was finished in 1996, mobilized almost one hundred laboratories for 6 years; however the yeast genome consists of only 13 million nucleotides, compared with 87 million for human chromosome 14.

The scientists at Genoscope used their expertise in bioinformatics to identify the genes in the sequence of chromosome 14. To the 506 genes already known on this chromosome they added 344 other validated or “putative” genes. Furthermore, two regions which are very important for the immune system were caracterized. Almost 60 genes on chromosome 14 have already been implicated in genetic diseases. Since the beginning of the decade, sequencing progress has helped several groups to identify 6 new genes for genetic disorders on this chromosome, thus economizing several months’ work (see “What is the use of the Human Genome Project?” and “What is a genetic disease?”). Dozens of other “morbid” genes followed. In order for this research to succeed, it is important for the genes to be correctly delimited, and to have a complete inventory of the genes. The Genoscope scientists are constantly striving to perfect their annotation, and they even use tools to evaluate and improve the annotation of the ensemble of the human genome.


  1: What is the public project for sequencing the human genome?
  2: Has the human genome been completely sequenced?
  3: How many genes do humans have?
  4: Why is it so difficult to find the genes in a human genome sequence?
  5: Where did the sequenced human DNA come from?
  6: Is the human genome “freely available”? If not, who owns it?
  7: Why was there a Human Genome Project What is its use?
  8: Who were the members of the international consortium What was the role of each of them?
  9: What was the French contribution to the Human Genome Project?
  10: How much did the Human Genome Project cost?
  11: With the end of the Human Genome Project, are the large sequencing centers still useful?

The total cost of the Human Genome Project is about 2.7 billion dollars (1991 fiscal year dollars), wherease it was projected to cost 3 billion dollars at the beginning of the project in 1990. This economy resulted from the considerable technological progress and from the acceleration of the project which was finished two years ahead of schedule. A large portion of this amount was spent on finishing the genomic draft obtained in 2000. The sequencing of chromosome 14 cost about 10 million euros, to which several million euros for analysis and annotation must be added.

Without a doubt, the expected benefits for society as a whole greatly exceed the expenditure for this investment; research based on the genome should lead to a great expansion of the biotechnology industry, new treatments and drugs, and huge progress for human health, for example in the domain of diagnosis.


  1: What is the public project for sequencing the human genome?
  2: Has the human genome been completely sequenced?
  3: How many genes do humans have?
  4: Why is it so difficult to find the genes in a human genome sequence?
  5: Where did the sequenced human DNA come from?
  6: Is the human genome “freely available”? If not, who owns it?
  7: Why was there a Human Genome Project What is its use?
  8: Who were the members of the international consortium What was the role of each of them?
  9: What was the French contribution to the Human Genome Project?
  10: How much did the Human Genome Project cost?
  11: With the end of the Human Genome Project, are the large sequencing centers still useful?

Far from getting shorter, the list of genomes to be sequenced keeps on getting longer. To interpret the sequence of a genome, it is invaluable to compare it with other genomes. The species compared may be closely related, or they may belong to branches which have diverged early during the course of evolution. The knowledge gained with each strategy will not be the same. The farther apart two species are in terms of evolution, the more their genomic sequences will have diverged, which may limit the breadth of the comparison. Nevertheless, the parts which have diverged the least, i.e. the genes, will be more clearly distinguished from the rest of the sequence: these regions which are “conserved” between two genomes will serve as landmarks for identifying the genes. It is therefore instructive to have genomes from a spectrum of species at our disposal, chosen at key points in the tree of evolution.

Take the example of the human genome. The chimpanzee is our closest relative in the animal world, and the sequencing of its genome, which is 99% identical to ours, will provide fascinating information on the genetic changes which took place during the last few million years of the evolution of the human branch. The sequencing of the genome of the mouse, which was finished in 2003, will benefit the ensemble of biomedical research, because this rodent has been an animal model for genetics for a long time. The sequences of other placental mammals will extend the knowledge provided by the mouse genome. It will also be instructive to sequence a representative of the marsupials, which separated early from the rest of the mammals. The genome of the kangaroo may clarify the earliest steps in the history of the mammals, and provide a good compromise species in the search for human genes.

Beyond this, representatives of other branches of the vertebrates will facilitate this research, because in general, the vertebrates have conserved a common gene baggage. The vertebrates sequenced to date or which are in the process of being sequenced include a bird-chicken, and two fish with compact genomes; Genoscope performed half of the sequencing of one of these fish, Tetraodon nigraviridis. In 2000 Genoscope used comparisons between genomic sequences of Tetraodon and humans to estimate the number of human genes at about 30,000, and this genome continues to be useful to perfect the annotation of the human genome. Still further removed, we find the genomes of an ascidian, a marine animal which is a close relative of the vertebrates, that of the worm Caenorhabditis elegans and the fly Drosophila melanogaster. The genomes of very simple multicellular organisms may reveal the changes which accompanied the organization of cells into “cell communities”. Finally, the genome of yeast, a unicellular organism, is useful in the discovery of elements which are common to the ensemble of eukaryotes, living organisms in which the genome is sheltered in a nucleus within the cell, from humans to oak trees to Paramecium. The comprehension of fundamental eukaryotic mechanisms such as condensation, recombination and segregation of chromosomes during cell division is of great importance in the study of certain human diseases.

To the above reasons for undertaking new sequencing programmes, more specific reasons can be added. The sequence of the genome of this or that organism may be important for economic (a microbe importance for the dairy industry, for example) or medical (which group of genes explains the virulence of a bacterium compared to that of a related species?) reasons. It is easy to understand the importance of sequencing the genome of rice, the basis of alimentation of half of humanity, or the genome of the anopheles mosquito which is the vector of malaria which kills over a million persons every year. A number of pathogens—bacterial or eukaryotic—have already been sequenced, and others will soon be done. Finally, the exploration of the bacterial world in its ensemble will occupy sequencing centers for many decades: genomic studies of diverse environments (soil, ocean, waste water treatment plants) which have been going on for several years have revealed a formidable bacterial diversity. We only know about 1% of bacterial species; the others have not been noticed because we have not been able to cultivate them. Bacteria exhibit great metabolic inventivity, and these mysterious species constitute a rich reservoir of genes which may be very important in the domains of industry and the environment. The exploration of the genomes of these bacteria is a task with an amplitude comparable to the Human Genome Project, and for which the large sequencing centers are more than ever necessary.

Last update on 22 January 2008

© Genoscope - Centre National de Séquençage
2 rue Gaston Crémieux CP5706 91057 Evry cedex
Tél:  (+33) 0 1 60 87 25 00
Fax: (+33) 0 1 60 87 25 14

Home | Overview | Projects | News | Press Panorama | Resources | Contact
Follow-up of the site's activity RSS 2.0 | Site Map | Credits | Copyright