All the versions of this article:
> Brunet F. G. & al. “Gene Loss and Evolutionary Rates Following Whole-Genome Duplication in Teleost Fishes”. Molecular Biology and Evolution. 2006 23(9):1808-1816
> Jaillon O. et al. "Analysis of the Tetraodon nigroviridis genome reveals the protokaryotype of bony vertebrates ans its duplication in teleost fish." Nature. 2004. 431, 946-957.
> Volff J.N. et al. “Diversity of retrotransposable elements in compact pufferfish genomes.” Trends Genet. (2003) 19, 674-678.
> Lutfalla G. et al. “Comparative genomic analysis reveals independant expansion of a lineage-specific gene family in vertebrates: the class II cytokine receptors and their ligands in mammals and fish.” BMC Genomics (2003) 4, 29.
> Bouneau L. et al. “An active non-LTR retrotransposon with tandem structure in the compact genome of the pufferfish Tetraodon nigroviridis.” Genome Res. (2003) 13, 1686-95.
> Grützner F. et al. “Four-hundred million years of conserved synteny of human Xp and Xq genes on three Tetraodon chromosomes.” Genome Res. (2002) 12, 1316-1322.
> Dasilva C. et al. “Remarkable compartimentalization of transposable elements and pseudogenes in the heterochromatine of the Tetraodon nigroviridis genome.” Proc. Natl. Acad. Sci. USA (2002) 99, 13636-13641.
> Roest Crollius H. et al. “Human gene number estimate provided by genome wide analysis using Tetradon nigroviridis genomic DNA” Nature Genet. (2000) 25, 235-238.
> Roest Crollius H. et al. “Characterization and repeat analysis of the compact genome of the freshwater pufferfish Tetraodon nigroviridis” Genome Res. (2000) 10, 939-949.
The pufferfish Tetraodon nigroviridis has a genome of 350 Mb, the smallest genome known to date in the vertebrates. This characteristic makes it very attractive for genomic studies, and inspired the launching of a sequencing project at Genoscope in 1997, the year that the center opened. The initial objective of Genoscope was to compare the genomic sequences of this fish to that of humans to help in the annotation of human genes and to estimate their number. This strategy is based on the common genetic heritage of the vertebrates: from one species of vertebrate to another, even for those as far apart as a fish and a mammal, the same genes are present for the most part. In the case of the “compact” genome of Tetraodon, this common complement of genes is contained in a genome eight times smaller than that of humans. Although the length of the exons is similar in these two species, the size of the introns and the intergenic sequences is greatly reduced in this fish. Furthermore, these regions, in contrast to the exons, have diverged completely since the separation of the lineages leading to humans and Tetraodon. The Exofish method, developed at Genoscope, exploits this contrast such that the conserved regions which can be identified by comparing genomic sequences of the two species, correspond only to coding regions. Using preliminary sequencing results of the genome of Tetraodon in the year 2000, Genoscope evaluated the number of human genes at about 30,000, whereas much higher estimations were current. The progress of the annotation of the human genome has since supported the Genoscope hypothesis, with values as low as 22,000 genes and a consensus of around 25,000 genes.
The sequencing of the Tetraodon genome at a depth of about 8X, carried out as a collaboration between Genoscope and the Whitehead Institute Center for Genome Research (now the Broad Institute), was finished in 2002, with the production of an assembly covering 90% of the euchromatic region of the genome of the fish. This has permitted the application of Exofish at a larger scale in comparisons with thegenome of humans, but also with those of the two other vertebrates sequenced at the time (Takifugu, a fish closely related to Tetraodon, and the mouse). The conserved regions detected in this way have been integrated into the annotation procedure, along with other resources (cDNA sequences from Tetraodon and ab initio predictions). Of the 28,000 genes annotated, some families were examined in detail: selenoproteins, and Type 1 cytokines and their receptors. The comparison of the proteome of Tetraodon with those of mammals has revealed some interesting differences, such as a major diversification of some hormone systems and of the collagen molecules in the fish.
A search for transposable elements in the genomic sequences of Tetraodon has also revealed a high diversity (75 types), which contrasts with their scarcity; the small size of the Tetraodon genome is due to the low abundance of these elements, of which some appear to still be active. Another factor in the compactness of the Tetraodon genome, which has been confirmed by annotation, is the reduction in intron size, which approaches a lower limit of 50-60 bp, and which preferentially affects certain genes.
The availability of the sequences from the genomes of humans and mice on one hand, and Takifugu and Tetraodon on the other, provide new opportunities for the study of vertebrate evolution. We have shown that the level of neutral evolution is higher in fish than in mammals. The protein sequences of fish also diverge more quickly than those of mammals. A key mechanism in evolution is gene duplication, which we have studied by taking advantage of the anchoring of the majority of the sequences from the assembly on the chromosomes. The result of this study speaks strongly in favor of a whole genome duplication event, very early in the line of ray-finned fish (Actinopterygians). An even stronger evidence came from synteny studies between the genomes of humans and Tetraodon. Using a high-resolution synteny map, we have reconstituted the genome of the vertebrate which predates this duplication - that is, the last common ancestor to all bony vertebrates (most of the vertebrates apart from cartilaginous fish and agnaths like lamprey). This ancestral karyotype contains 12 chromosomes, and the 21 Tetraodon chromosomes derive from it by the whole genome duplication and a surprisingly small number of interchromosomal rearrangements. On the contrary, exchanges between chromosomes have been much more frequent in the lineage that leads to humans. All these results are presented in an article published in the 21th October 2004 issue of Nature.
Tetraodon nigroviridis is a little fish (less than 10 centimeters long in captivity) which is popular with tropical fish fanciers. In its natural state, it is found in rivers and streams of Southeast Asia (Indonesia, Indochina, Malaysia, the Philippines), as well as in estuaries and mangrove swamps, and even occasionally in the sea; it is therefore not strictly limited to fresh water. Tetraodon nigroviridis belongs to the family of “smooth” pufferfish (Tetraodontidae) and, at a higher systematic level, to the order Tetraodontiforms, which also includes the diodons or “spiny” pufferfish (Diodontidae), sunfishes (Molidae), boxfishes (Ostraciidae) and triggerfishes (Balistidae), among others. Tetraodon nigroviridis is often confused with a closely-related species, Tetraodon fluviatilis. Genoscope has found molecular markers which can differentiate between the two species (access to this data and phylogenetic results). Other members of the Tetraodontid family are fugus, salt water fish for which one species, Takifugu rubripes, was also the target of a sequencing project (see below). The Tetraodon nigroviridis and Takifugu rubripes lines diverged 18 to 30 million years ago.
Interest in the Tetraodontids was kindled in 1968 when R. Hinegardner discovered the very small size of their genomes; in particular, this author showed that the genome of Tetraodon, with a size grossly estimated at 380 Mb (by measuring the DNA content of the cells), was the smallest of the 300 genomes of teleost fishes measured, and consequently the smallest vertebrate genome known. As a comparison, the zebrafish, Danio rerio, which is used as a model organism in genomics and genetics, has a genome of 1.6 billion base pairs, four times larger than that of Tetraodon. The human genome, which contains 3.2 billion base pairs, is eight times larger than that of Tetraodon. Following this discovery, Sydney Brenner and his colleagues at Cambridge confirmed the small size of the genome of another pufferfish, the fugu Takifugu rubripes, which they estimated to be 400 Mb by using a partial random sequencing approach. This early sequence data further confirmed the hypothesis of the compactness of the pufferfish genome; since the majority of the functions present in other vertebrates are found in teleost fish, it was reasonable to suppose that fish and mammals possess a more or less comparable repertoire of genes. It was therefore expected that the small size of the pufferfish genome was not due to a large reduction in the number of genes compared to mammals, but to a reduction in non-coding sequences. The first sequences obtained by Brenner confirmed this: they revealed the small size of introns and intergenic regions of fugu, resulting from the scarcity of repeated sequences. It is in this sense that this genome, which is dense in exons, is called “compact”. The structure of the genes themselves appeared conserved: in the few cases studied the introns were found in similar positions in the Takifugu gene and in the orthologous gene in humans. A preliminary analysis of the Tetraodon nigroviridis genome published in 2000, demonstrated the same rarity of repeated sequences.
How can this compactness of the pufferfish genome be explained? The analysis of partial genomic sequences from Tetraodon nigroviridis and Takifugu rubripes has provided some of the answers. Transposable elements, which constitute the majority of repeated sequences, represent 45% of the sequence of the human genome, whereas they are very scarce in pufferfish: they represent only 3.8% of the assembly of the genomic sequence of Tetraodon nigroviridis and 2.7% of the Takifugu assembly. These are the smallest values known in multicellular eukaryotes, even if the true abundance of transposable elements in these two fish is most likely a little higher; they are in fact concentrated in the heterochromatic regions of the Tetraodon genome (Dasilva et al., 2002), and therefore underrepresented in the assembly. Although rare, the transposable elements are characterized by high diversity in Tetraodon. It is possible that the transposable elements and other sequences which have no functional constraints have a higher level of deletion in these fish, which seems to be confirmed by the fact that pseudogenes are eliminated more rapidly in Tetraodon than in humans. Another factor which could explain the compact genomes of the Tetraodontids is a high resistance to insertions which may be inherent to this lineage, as suggested by a comparative study with diodons and a sunfish (Neafsey and Palumbi, 2003). The size of the genomes of pufferfish may have increased more slowly than it did in other Tetraodontiform lineages, or, more probably, may have experienced a reduction during evolution. At any rate, these animals constitute an excellent model for understanding the factors which determine the evolution in size of genomes.
Sydney Brenner was the first to express an interest in a compact
genome such as that of Takifugu as a tool for the study of other
vertebrate genomes. In 1993 he proposed to sequence all or part of the
genome of fugu in order to access the repertoire of vertebrate genes
at the lowest cost possible. By sequencing a given quantity of DNA in
a pufferfish, there is a much better chance of finding an exon than by
sequencing the same quantity of DNA in another vertebrate, especially
in mammals such as humans. Sequencing at moderate coverage is thus
sufficient to get a good idea of the gene content in a compact
genome. Furthermore, the rarity of repeated sequences makes a random
shotgun sequencing strategy possible, which is rapid and economic, but
is not suitable for a genome which is as vast and complex as the human
Despite these arguments, the fugu project did not get off the ground immediately. It had not yet obtained financial support by 1997, and its realization seemed problematic. Genoscope therefore decided to undertake the sequencing of another fish with a compact genome in order to make a “model vertebrate genome” available to the scientific community. Tetradodon nigroviridis was chosen on account of its small size, ease of maintenance in a fresh-water aquarium and its availability in the tropical fish fancier milieu, whereas Takifugu rubripes, which is found in the coastal waters of China and the Japanese archipelago, is difficult to import. Furthermore, large tanks of sea water are needed to maintain fugu which may attain several kilograms. Tetraodon nigroviridis is therefore more suitable for molecular analyses in which the biological material must be easily and constantly accessible (Crnogorac-Jurcevic et al., 1997).
Following Genoscope’s first publications on the genomic sequences of Tetraodon in the year 2000, the importance of sequencing a small genome for the analysis of the human genome was demonstrated. At this moment, the sequencing project for Takifugu rubripes was initiated, resulting in the publication of a draft genome sequence in 2002 (45,000 contigs assembled in more than 12,000 scaffolds covering 332.5 Mb, for a total genome size estimated at 365 Mb) (Aparicio et al., 2002). The comparison of predicted proteins in the fugu sequence with human proteins revealed strong correspondences for three-quarters of them, which validated the initial hypothesis of a comparable repertoire of genes in all vertebrates.
The Tetraodon and Takifugu projects are less redundant than they appear. Comparisons of the genomic sequences of two pufferfish separated by 18 to 30 million years enables improvement of the annotation on both sides, which reinforces the utility of these model genomes.
The principal objective of the sequencing of the Tetraodon genome, at the beginning of the project, was to compare the genomic sequences of this fish with those of humans in order to facilitate the identification of human genes. This comparative genomics approach is based on a simple hypothesis: during the 400 million years that have passed since the separation of the Tetraodon and Mammals lineages, the genomic regions which are constrained by their function - especially the coding sequences - have accumulated fewer mutations and thus diverged more slowly than the other regions. The evolutionary distance which separates us from this fish is sufficiently large to ensure that the introns and intergenic regions will have diverged completely, but is sufficiently short such that the majority of coding sequences will have been conserved for at least part of their length. This contrast thus makes it possible to identify them by comparisons at the whole genome scale (a different use of the genome than that chosen later in the Takifugu project, in which the human-fish comparison was limited to predicted genes).
For this purpose, Genoscope has developed a comparative genomics tool
called Exofish (for EXOn Finding by Sequence Homology), based on the
BLAST algorithm (Frequently
Asked Questions about Exofish). A comparison between genomic
sequences of the fish and the target human sequence produces Ecores
(Evolutionary COnserved REgions), which are alignments of
Tetraodon sequences with conserved regions in the human
genome. A calibration makes it possible to define the parameters for
optimizing specificity (conditions under which no ecore will be found
in the introns of known human genes) without compromising sensitivity
(the percentage of human exons detected).
Exofish was utilized for the first time in 2000, when only 33% of the sequence of Tetraodon and 42% of the human genome sequence had been determined. The goal of this comparison was to evaluate the number of human genes; at this time very different values had been proposed. The principle of the evaluation was simple: the number of ecores obtained on a partial sequence of the human genome was extrapolated to the ensemble of the genome, then divided by the average number of ecores per gene (as measured on a reference collection of human genes). This calculation led to an estimation of the number of human genes of 28 000 to 34 000 (see a press release). This result, which was published in 2000 (Roest Crollius et al., 2000), was in agreement with another estimate published at the same time, but which was based on a different principle (B. Ewing & P. Green, 2000). However, it was much less than the estimates that were current at the time: the human genome had been reported to contain from 50,000 to 90,000 genes, and some researchers believed that our genome contained at least 200,000 genes! Many were surprised by this drop in the estimation of the number of genes, because it indicated that humans had barely twice the number of genes of the fruit fly. Since the completion of the sequencing of the human genome in April 2003, the Genoscope’s estimate has been confirmed by progress in annotation: automatic annotation procedures such as Ensembl, Twinscan or SGP2 have identified between 22,000 and 43,000 human genes.
During 2003, when a 99% complete human genome sequence and a Tetraodon assembly covering 90% of the genome were available, Genoscope undertook a new evaluation of the number of human genes using Exofish. This time, the average number of ecores per gene was calculated from ecores present on five “finished” human chromosomes (chromosomes 6, 13, 14, 20 and 22), for which the annotation had been validated by human experts. From the number of ecores obtained for the ensemble of the human genome sequence, a first estimate of the total number of genes is deduced and corrected for the fraction of ecores which detect pseudogenes. This calculation leads to a human gene number ranging from 22,500 to 29,500, in good agreement with the above annotation results (Jaillon et al., 2004).
The ecores obtained by human-Tetraodon comparisons (accessible with the Comparative Genoscope browser) have been used in the annotation procedure for several human chromosomes which have been published to date (beginning of 2004). Despite these efforts, about 15,000 ecores remain outside of all annotations (genes or pseudogenes) of the human genome sequence as of spring 2004. These have served to define 904 new human gene models using gene models corresponding to these ecores in Tetraodon. More than 60% of these new putative genes, which were probably not detected previously because of their small size, have been confirmed by expression data.
When Exofish is utilized to compare genome sequences from Tetraodon with those of other mammals such as the rat and the mouse, the same ecores are detected as in the human genome. Conversely, the ecores produced by comparing genomes of humans, mouse and Takifugu with the genome of Tetraodon have helped in the identification of the genes of this little fish (for the annotation procedure, see the Sequencing project page).
The Tetraodon nigroviridis sequencing project started in 1997 at Genoscope. It was an internal project, which was defined and implemented at the very beginning of the center’s activity. At this time, the only vertebrate for which complete sequencing was planned was the human genome: the mouse sequencing project was initiated only in 1999 and the sequencing of Takifugu rubripes began at the end of 2000. The choice of Tetraodon nigroviridis as a sequencing target - a fish with a compact genome which was more convenient than Takifugu to maintain in captivity - was thus initially motivated by the need for a comparative genomics tool for the exploration of the human genome that could be obtained quickly and cheaply. The sequence data produced in this first random shotgun sequencing phase (0.3 genome equivalents) could thus be successfully used for a large-scale comparison with the human genome in 2000, indicating the existence of about 30 000 genes in humans (see previous section). In parallel with this random sequencing approach, regions of special interest (genes of the immune system) were integrally sequenced at Genoscope after selecting the corresponding genomic inserts in BAC clones.
The project was next oriented toward producing a sequence of the Tetraodon genome that would be as complete as possible, pursuing the whole genome shotgun strategy at high depth. The new objectives - a detailed analysis of the Tetraodon proteome, the organization of its genome and the regions syntenic with other vertebrate genomes - required a high quality assembly. In June 2001, the Whitehead Institute Center for Genome Research (WICGR, now the Broad Institute of MIT and Harvard) joined forces with Genoscope in this effort. Around half of the reads were undertaken by the WICGR, so that the project was considerably accelerated. Thus at the end of 2001, six genome equivalents had been produced (2.3 Gb of raw sequence), with each center contributing about half the sequences (see press release). This was a large enough amount of sequence data to enable a first attempt at assembly at Genoscope (70% of the genome covered by about one hundred thousand contigs). Two additional genome equivalents were subsequently produced by Genoscope.
Eventually the total of 8.3 genome equivalents was assembled in 2002. The resulting draft (about 50,000 contigs linked in more than 25 000 supercontigs, or scaffolds) covers 90% of the euchromatic regions of Tetraodon and presents better long-range contiguity than the genomic draft of Takifugu. Using data from physical mapping, 39 “ultracontigs”, representing 64% of the sequence assembly, could be anchored on the 21 chromosomes of Tetraodon (see their distribution on the karyotype). This is the first time that an overview of the organization of a fish genome has been available on a chromosome scale. Annotation of the draft sequence at Genoscope has produced 27 918 gene models. For more details on the assembly and annotation, see the Sequencing project page and the article in Nature, October 2004 (Jaillon et al., 2004).
One of the first lessons of the draft genomic sequence of Tetraodon was the confirmation of the small size of this genome. The 26 000 scaffolds which form the assembly cover 342 Mb (gaps included). The 312 Mb of sequence that they contain represents 90% of the euchromatin of Tetraodon (as measured with a set of random reads), which can thus be evaluated at 346 Mb.
On the other hand, the total size of the 21 chromosomes which constitute this genome has been estimated to fall between 329 to 356 Mb by flux cytometry. Thus the heterochromatic regions seem to be limited. This extreme compactness explains why the percentage of GC base pairs (GC%) of the genome of Tetraodon is higher than that of large mammalian genomes, as this percentage is positively correlated with the gene density in vertebrates. However, the distribution of GC% produces the same type of asymmetric bell curve - which indicates a non-homogeneous distribution of genes - as that found in humans and mice; it is only shifted toward the higher values. In Takifugu rubripes, however, the distribution of GC% is homogeneous. This striking difference between the genomes of two closely related species might be explained by sequencing or assembly defects in a (G+C)-rich fraction of the genome of Takifugu.
The genomic sequence has also made it possible to perform a detailed study of the diversity and abundance of transposable elements in Tetraodon. As expected, these are rare in this compact genome: about 4,000 copies were detected in unassembled sequences, at least 3.8% of the genome. However, they are also quite diverse : 75 consensus sequences (see Table 3) could be reconstituted (53 class I elements and 22 class II elements, representing a large variety of families), whereas only about 20 different types have been defined from the millions of copies present in the human and mouse genomes. Some families seem to still be active. In order to explain the record rarity of transposable elements in Tetraodon, it must be postulated that they have experienced an unusually high level of deletion. The transposable elements are concentrated in heterochromatic regions, in particular on the short arms of the small chromosomes. In euchromatic regions, there is a preferenial presence of SINEs (short interspersed elements) in (A+T)-rich regions and of LINEs (long interspersed elements) in (G+C)-rich regions, which is the opposite of what is observed in mammals.
Annotation of the genomic sequence of Tetraodon has led to the definition of 27 918 models of protein-encoding genes. This number is of the same order as that found in humans. On a very global scale, the results of Exofish analysis (see above) confirm this result. We have compared the genomic sequences of four vertebrates (Takifugu, Tetraodon, mouse and human), using each in turn as a target, and obtained a similar number of evolutionary conserved regions (ecores). However, these ecores could just as well correspond to pseudogenes as genes, and we know that pseudogenes are much less frequent in Tetraodon and Takifugu than in mammals. This implies that these fish possess slightly more protein-coding genes than mammals.
Genes represent 40% of the Tetraodon genome. There are an average of 6.9 coding exons per gene, and the size distribution of the exons is similar to that observed in humans. On the contrary, the size distribution of the introns is clearly different. The lower size limit for introns is the same in both vertebrates (between 50 and 60 bp), but there are many more introns which approach this lower limit in Tetraodon than in humans. This phenomenon only concerns a sub-population of Tetraodon genes, however. It is unclear why some genes have evolved toward reduction in intron size whereas others do not seem to be affected.
A special effort was made to identify genes which pose problems in annotation (see the "Sequencing project" page). This is notably the case for genes encoding type I cytokines and their receptors, whose sequences are poorly conserved in vertebrates. None of these genes were found at first in the Takifugu genome, and it was justified to ask whether this large gene family which includes hormones and interleukins is absent in fish. In the Tetraodon genome, we discovered 39 genes from all gene families known in vertebrates. Comparison with the latter indicates that, since the last common ancestor of Tetraodon and mammals, diversification has been especially strong on hormone systems (growth hormones and prolactin) in the Actinpterygian lineage (ray-finned fish), whereas it has operated more strongly on interleukins in the mammalian lineage.
The global comparison between the proteome of Tetraodon and those of other sequenced vertebrates (Takifugu, human, mouse) and of the sea squirt Ciona intestinalis (an invertebrate chordate) has been instructive. Analysis of the protein domains using InterPro revealed only a few differences between fish and mammals, but some of these are of particular interest:
If the frequency of various functional categories is compared based on “Gene Ontology” classifications, again, few differences are found. The two fish and Ciona tend to differ as a group from the ensemble of mammals; they exhibit a higher frequency of enzymatic and transport functions, and a lower frequency of structural proteins and proteins implicated in signal transduction.
The availability of the genomic sequences of humans and mice on one hand, and Takifugu and Tetraodon on the other, provides new opportunities for the study of the evolution of vertebrates. Initially, we studied the rate at which these sequences have diverged in the course of evolution, at the level of DNA with no functional constraints.
When one is to study the rate of neutral evolution between two vertebrate lineages which are as far apart as teleost fish and mammals, it is not possible to identify ancestral repeated sequences as can be done with humans and mice. We have therefore turned to the freely mutable positions in genes inherited in common by the four species. The first step was thus to define an ensemble of 5802 quadruplets of orthologous genes. Within these orthologs, we then studied the third positions of some codons which are "fourfold degenerate" ? that is, the third base of these codons can be substituted by any of the three other possible bases without altering the nature of the encoded amino acid and is therefore “free” to change. This study has confirmed the neutral nucleotide substitution rate which was previously measured using a smaller number of orthologs between humans and mice; it has also shown that the level of neutral evolution between the two fish was 2.5 times higher than between the two mammals.
We then studied the level of divergence of protein sequences, by comparing the sequences of the same ensemble of orthologs. The average similarity is less between orthologous proteins of Takifugu and
Tetraodon than between proteins of humans and mice. Furthermore, the lineages of the two fish have not been separated for as long as those of the two mammals. Thus, the protein sequences have diverged faster in the fish, as has already been observed using a smaller sample of genes. A comparison with a close relative of the vertebrates - the sea squirt Ciona intestinalis - confirms this result: the frequency of mutations which cause the substitution of one amino acid for another (Ka) is higher on average between Ciona and Tetraodon than between Ciona and humans. Furthermore, the ratio between non-synonymous (Ka) and synonymous (Ks) substitutions is also higher between the two fish than between the two mammals. Since the last common ancestor of pufferfish and mammals, evolution has thus accelerated in the first group, or else slowed down in the second lineage.
Recent studies have supported the hypothesis of the duplication of the entire genome in the lineage of ray-finned fishes. An ancient whole genome duplication (WGD), followed by a massive loss of duplicated genes, has already been demonstrated in yeasts. Such events may be a very important mechanism in the evolution of eukaryotic genomes. We therefore looked for the trace of a WGD in the Tetraodon genome. The Tetraodon genome sequence presents a unique advantage for this type of research: it is the first genome of a fish for which the available assembly is, in large part, anchored on the chromosomes. For the first time, we can study the distribution of duplicated genes at the global level, with the goal of determining whether there was a sudden leap to the tetraploid state, or whether there was a serie of discrete duplication events.
We have thus identified more than 1000 pairs of duplicated genes in Tetraodon, and almost 1000 in Takifugu. About 75% of the duplication events seem to have happened before the separation of
Takifugu and Tetraodon lineages. The distribution of these ancient gene duplications is striking: copies of genes present on a given chromosomal segment are all present on a single other chromosome, and this pattern is found for the ensemble of chromosomes. The simplest way to explain this finding is to suppose that the whole genome has been duplicated in a single event. The conservation of this tetraploid organization over the hundreds of millions of years since the duplication could be due to the high density of genes in the Tetraodon genome: chromosome rearrangements would be more detrimental than in mammals, because they would be more likely to interfere with gene expression.
A study of the distribution of orthologous genes between the genome of Tetraodon and those of humans and mice has provided the definitive evidence in favor of the WGD, and therefore ended the controversy on this subject. Again, the anchorage of a large part of the Tetraodon genome sequence on the chromosome has been the key factor: it has made it possible to produce the first high resolution synteny map between fish and mammals. From 6,684 Tetraodon genes of known position with an ortholog in the human genome, we have defined 900 synteny groups (groups composed of at least two contiguous Tetraodon genes for which the orthologs are situated on the same human chromosome). In the same way, 6,831 Tetraodon genes with mouse orthologs have been organized into 1,014 synteny groups. The fact that the synteny groups are more numerous with mice than with humans confirms the higher frequency of chromosome rearrangements in rodents.
We have utilized these synteny findings to reconstitute the ancestral organization of the genome at the moment of divergence of the ray-finned fishes lineage. After duplication of this ancestral genome, the supernumerary copies of the genes would have been lost at random on one chromosome or the other in a duplicated pair. We would therefore expect to find genes derived from a same ancestral chromosome on two different modern chromosomes. The investigation of the syntenic relations with the human genome effectively confirms this binary distribution: the orthologs of genes of a given chromosomal region in humans are found, for the most part, on two different chromosomes of Tetraodon. This provides an independent demonstration of the duplication of the whole genome, but is also a much more powerful strategy than the study of duplicated genes for delimiting the duplicated chromosome segments and reconstituting the fusion and fragmentation events. We propose that there were 12 chromosomes in the ancestral genome. This result is consistent with the modal value for the haploid number of chromosomes in Teleosts, which is 24. The 24 chromosomes formed by the duplication in the Teleost lineage underwent 5 fusions, 3 translocations and 2 fragmentations to produce the 21 chromosomes of the Tetraodon genome.
On the evolutionary path leading to humans, the interchromosomic rearrangements were more frequent. This reflects the large increase in the number of repeated sequences in this lineage. In the resulting “swollen” genomes, less dense in genes, rearrangements could happen more frequently as they were less deleterious. It is even possible to evaluate the relative age of these events: the genes present on two ancestral segments that were joigned long ago would have been mixed and homogenized by intrachromosomal rearrangements on the whole length of the new chromosome; on the contrary, the genes present on newly joigned segments would be more clearly seggregated. It is thus possible to detect, for example, the recent fusion wich led to the human chromosome 2, from two chromosomes that remain distinct in the chimpanzee. In a near future, the sequencing of the genomes of other vertebrates and of the amphioxus, closest living parent of the vertebrates, will allow further exploration of the history of this chromosomal mosaic.
The sequencing of the genome of Tetraodon nigroviridis is one of Genoscope’s internal projects. In 2001, the Broad Institute of MIT and Harvard (formerly the Whitehead Institute Centre for Genome Research) has joined the project and has produced a substantial part of the reads. The assembly step has also been performed in collaboration with the Broad Institute. Several groups have contributed to the analysis and the annotation of the genomic sequence of Tetraodon nigroviridis: