Lien pour acceder au site du CEA
Site Genoscope en langue française Genoscope site in english El sitio Genoscope en español
Home page > Research > Metabolic Genomics Group > Bioinformatics sequence analysis > Laboratory of bioinformatics sequence analysis (LABIS)

All the versions of this article:

Laboratory of bioinformatics sequence analysis (LABIS)





Tara Oceans projects
Group members
Publications

JPG - 60.6 kb
Vitis vinifera
©M. ADRIAN /INRA
We are in the process of analyzing the sequence of the genome of the grapevine. A partial version of the assembly has revealed that this species has a proteome enriched in genes implicated in the synthesis of aromatic compounds. Furthermore, this sequence has not undergone numerous rearrangements since the origin of dicotlyledons, and in particular no recent whole genome duplication event of the type which has occurred in Arabidopsis thaliana and Populus trichocarpa. This characteristic has turned out to be an advantage, revealing that three different genomes have contributed to the structure of the karyotype of their last common ancestor. The genome of the grapevine is an excellent but unexpected model for the study of the evolution of flowering plants.

The eukaryotic genome analysis team studies the structure and evolution of eukaryotic genomes [using data] from sequencing projects, in partnership with laboratories from the Institut Genomique, or in collaboration with outside laboratories. For these analyses, three main aspects have been developed: assembly, annotation and genomic analysis.

Assembly

Starting with collections of random reads from a genome sequencing project, known as WGS (Whole Genome Shotgun), the assembly step has the goal of reconstituting the sequence of the chromosomes of the organism to be studied. The algorithms used are based on identity relationships (similarities) between overlapping reads and on topology information provided by the “clone links” or markers from genetic or physical maps. The result of the assembly, an ensemble of “supercontigs”, is a consensus reconstruction of the original sequence.

The tools and methods developed by the group for this activity come from informatics developments realized at the Institut Genomique, or research carried out by other groups involved in assembly problematics such as the “Arachne” program developed at the Broad Institute (www.broad.mit.edu).

Annotation

Annotation has the objective of defining the structure of the genes in the assembled sequences, i.e. their positions from beginning to end, and their exons. We have chosen an approach which uses various types of information which are a priori undefined. However, this is classified into three large categories:

1/ Ab initio predictions. For each genome we calibrate and utilize several gene prediction programs which use data on statistical properties of protein-coding genes which are known for the species. Preliminary calibration is carried out using a collection of known genes.

2/ Exploitation of coding sequences. We align the ensemble of public (published???/OR: proteins available in public databases???) proteins as well as cDNA sequences available for related phyla. We place more statistical weight on cDNA collections from the same species, either from public databases or from Genoscope. Finally, the alignment is carrried out using programs which constrain exon junctions to sites which are compatible with splicing borders.

3/ Comparative genomics. Depending on the phylum, we add results from alignments between genomes for which a previous calibration has permitted preferential retention of coding regions. This principle is based on better conservation of coding regions compared with non-coding regions. We have developed a tool for this, Exofish, which was historically developed to detect genes in the human sequence using the sequence of the pufferfish, Tetraodon. This work led to the first re-estimation of the number of human genes.

 
 

PNG - 102.5 kb
(Extract of grapevine’s GGB)
Annotation of a locus of K11 of Vitis vinifera.

 
 

The ensemble of these predictions is then “reconciled” in order to retain only one “gene model” per locus. This step is performed by exploiting the possibilities of utilization of the Gaze program. This tool integrates an ensemble of weighted information which feeds into an automat which we adapt. Using dynamic programming, this step guarantees production of an ensemble of gene models for each sequence without phase rupture and with a maximum score.

Visualization

The results of the various analyses are stored in a database and are accessible to collaborators via a dedicated interface, a GGB (Generic Genome Browser) navigator.
 
 

PNG - 44.6 kb
Tetraodon nigroviridis

We have analyzed the DNA sequence of the fish, Tetraodon nigroviridis because of its very small size, 8 times smaller than the human sequence. The level of conservation of genes of these two species after 400 million years of evolution since their separation made it possible for us to estimate the number of genes in humans in 2000. In 2004, the reconstitution in silico of the chromosomes of Tetraodon provided evidence for a whole genome duplication event in this lineage. This duplication, which is called 3R by evolutionary biologists, had been previously suggested as one of the hypotheses which could explain the success of the teleost bony fish group, with their large number of species adapted to numerous different ecosystems.
 
 
  Analysis

For each species we perform a certain number of analyses relating to structural, functional and/or evolutionary characterization, in collaboration with other laboratories. We have developed savior-faire in the discovery of ancestral total genome duplication events (WGD) and other polypoloidizations. This type of evolutionary event is thought to be an essential agent in the acquisition of new functions and in the emergence of new species. Major evolutionary lineages such as those of teleost vertebrates or angiosperm plants almost certainly derive from polyploidizations. For these studies, the sequences of the genomes of the fish Tetraodon nigroviridis, the grapevine Vitis vinifera and the ciliate Paramecium tetraurelia are excellent models.
 
 

PNG - 105.4 kb
Duplications of the Paramecium’s genome

The sequence of the genome of the macronucleus of Paramecium spectacularly conserves the trace of at least 3 whole genome duplications during evolution (the outer part of circle represents more recent events, inner parts of the circle, more ancient). Although very few duplicated genes remain following whole genome duplications (fish, plants, yeast), in this case 24 000 genes, representing 68% of the total, are maintained in 2 copies following the most recent duplication. Furthermore, very few chromosome rearrangements have occurred because the order of the genes has been preserved. These characteristics, essentially the large number of genes duplicated at three different evolutionary periods, show that the loss of genes is strongly constrained over the short term. In particular, the stochiometric effects on genes implicated in interactions is very strong.

Projects

 Tetraodon nigroviridis (link, GGB)
 Paramecium tetraurelia (link, GGB)
 Vitis vinifera (link, GGB)
 Oikopleura dioica (link, GGB)
 Tuber melanosporum (link, GGB)

Last update on 15 April 2010

© Genoscope - Centre National de Séquençage
2 rue Gaston Crémieux CP5706 91057 Evry cedex
Tél:  (+33) 0 1 60 87 25 00
Fax: (+33) 0 1 60 87 25 14

Home | Overview | Projects | News | Press Panorama | Resources | Contact
Follow-up of the site's activity RSS 2.0 | Site Map | Credits | Copyright