français Back-Office Access - Authentication
MicroScope Platform
Bioinformatic Tools
Current Events



Research Themes

Methodological Developments

MicroScope Platform

Professional Trainings

Scientific production

Contact & Access

PkGDB : Prokaryotic Genome DataBase

MaGe - Magnifying Genomes
MaGe : a bacterial genomes annotation system

PkGDB - Prokaryotic Genome DataBase: a database of «clean» & consistent annotation, source of comparative genomics methodologies. PkGDB fits into the framework of Collaborative Projects. The purpose of this Database differ according to the status of the annotation of the studied prokaryotic genomes.

(NCBI RefSeq)

Integration into PkGDB:
- Data homogeneity.
- «frameshifts» management.

Syntactic reannotation:
- Data completion & correction.

(annotation projects)

Intrinsic analysis results:
- Codon usage (gene models).
- Gene, signal, repeat predictions.

Extrinsic analysis results:
- Syntenies.
- Blast, InterPro, COG, PRIAM (enzymatic functions), ...

A first step is to recover, in pkgdb, the original data annotation whose overall coherence is verified.

We also carry a careful re-annotation of genes containing reading mismatches or «frameshifts» (a specific interface for the examination of these pseudo-genes has been developed for this purpose). This laborious work is even more important that there is no precise rules, in banks, for the annotation of «frameshifts» (premature stop codon linked to an authentic «frameshift» or not, insertion sequence, trace of an ancient functional gene, etc..).

In a second step, we construct gene models for the studied genome, taking into account all the annotated genes (the proportion of genes putatively non-functional can reach 20 to 30% in some pathogens).

A second set of relational tables in PkGDB can then store the results of the AMIGene method whose set of CDS predicted with model calculations is compared to annotated genes in banks: this way we identify non-annotated genes (unique annotation in AMIGene on which Blast comparisons are automatically generated), and genes that may correspond to erroneous annotations (annotation of sequences unique to banks).

These two steps allow us to obtain a comprehensive annotation of public bacterial genomes integrated into PkGDB, essential data for comparative genomics.

In particular, groups of synteny with other genomes are then sought, taking into account:
- Fragments of genes identified in the chromosome (which are however no more functional in the cell).
- From genes originally not annotated by the authors.

For a given genome, the search of synteny groups is performed, firstly with genome data stored into PkGDB, and secondly with the proteomes of other complete genomes of the RefSeq section in GenBank (whose data are then integrated into PkGDB in their original version).

The PkGDB database is the starting point to build our thematic databases for (re-)annotation projects.

The project specificity is managed on several levels:
- The choice of «reference» bacterial genomes compared each others (most often phylogenetically related organisms or are likely to interact within the habitat of the annotated genome).
- The choice of the genome(s) model(s) (ie, E. coli, B. subtilis, M. tuberculosis, etc..), for which available annotation data are particularly neat and very up-to-date.

In the case of the annotation of a newly sequenced genome, our databases are used to manage sequence data being on «finishing» stage of the sequencing process that is still difficult, especially for GC rich genomes or containing many repetitions. The initiation of the annotation process can be performed on a set of independent contigs or, as is it often the case on a «master molecule» built with the contigs whose order is commonly known on final chromosome 16 (they are separated by hundreds «N»).

The expert annotation can begin from the results of automatic analysis, even before «finishing» process of the molecule has reached to its end. Also, after the correction of «frameshifts» errors detected in the bacterial chromosome, the annotations of the first read are «transferred» into the final molecule. The experts then complete the annotation work.

The management of current thematic databases bases is a constant work performed by our team (updates and analysis of data) and our biologists network (expert annotation).

Also, the organization within a single structure, of public data (including updates, infrequent in sequence databases from banks must be integrated into our databases) and data «private» data (i.e, limited to members of a project until the final publication) imposes new constraints on our developments.

   Credits       Legal Information   
2005-2019 puce Laboratoire d’Analyses Bioinformatiques pour la Genomique et le Metabolisme (LABGeM) puce
Tél. 33 (0)