Lien pour acceder au site du CEA
Site Genoscope en langue française Genoscope site in english El sitio Genoscope en español
Home page > Sequencing > Projects > Microorganisms > Acinetobacter baylyi > Annotations > Methods and results

All the versions of this article:


Methods

Sequence annotation and comparative genomics

The first set of potential CoDing Sequences (CDSs) was identified using the AMIGene software (Annotation of MIcrobial Genes ; (Bocs et al., 2003)) trained with a set of CDS larger than 500 bp from the genomic sequence. The multivariate statistical technique of Factorial Correspondence Analysis (FCA) and clustering methods were then applied to this set of predicted CDSs in order to derive multiple models taking into account the compositional diversity of genes within the Acinetobacter ADP1 genome. Three gene models were then subsequently used together in the core of AMIGene, the minimum length of the CDSs being equal to 60 bp. This second set of putative genes (made of 3683 CDSs) were submited to functional annotation :

  • to determine homology: BLAST (Altschul et al., 1997), searches against the SWALL databank (Boeckmann et al., 2003)
  • to find protein motifs and domains: InterPro database (Apweiler et al., 2001)
  • to classify genes coding for enzymes: PRIAM software (Claudel-Renard et al., 2003)
  • to identify transmembrane domain: TMHMM vs2.0 (Krogh et al., 2001)
  • to predict signal peptide regions: SignalP vs2.0 (Nielsen et al., 1999)
  • to find out tRNA: tRNAscan-SE (Lowe and Eddy, 1997)
  • to detecte intrachromosomal repeats: method described in (Achaz et al., 2000)

Each predicted gene was assigned a unique numeric identifier prefixed with “ACIAD”. The first CDS from the origin of replication, the putative dnaA gene, was assigned as ACIAD0001, and each following CDS was numbered consecutively in a clockwise direction. Manual validation of the automatic annotation was performed using the interface MaGe (Magnifying Genomes). Translational start codon were corrected based on protein homology, proximity of ribosome-binding site, and relative positions to predicted signal peptide if so. For this purpose, the useful Artemis sequence Viewer (Rutherford et al., 2000) has been used.

A total of one hundred and fourty eight sequenced genomes were used in the analyses. Orthologs between Acinetobacter ADP1 and the 148 other genomes, were defined as genes showing a minimum of 30% identity and a ratio of 0.8 of the length of the smallest protein. Orthology relations were strengthened by synteny detection (i.e, conservation of the chromosomal co-localisation between pairs of orthologous genes from different genomes) using the Syntonizer software. Our method is not restricted to the bi-directional best-hits definition, and thus allows for multiple correspondences (gene fusion/fission, duplication). A “gap” parameter, representing the maximum number of consecutive genes not involved in a synteny group, has been set to 5 genes.

All these informations (i.e, syntaxic and functional annotations, and results of comparative analysis), are managed in a relational database (using MySQL SGBDR software). Manual validation of the automatic annotation was performed using our web interface MaGe, which allow to graphically visualize both the Acinetobacter annotations and the maps of the synteny groups.

Results

Table: Comparison of general features and functionnal categories between Acinetobacter ADP1 and selected free-living organisms.


Acinetobacter ADP1 PSAE PSPU PSSY RALTO ESCO
General features





Size (Mb) 3.6 6.3 6.4 5.8 5.8 4.6
GC% 40.3 66.6 61.6 58.4 66.9 50.8
Nb CDS 3325 5567 5420 5615 5129 4273
% Coding 88.8 89 87.7 86.8 87.3 92
rRNA operon 7 4 7 5 4 7
tRNA 76 63 63 63 58 82







Known and putative proteins (%) 62.6 65.9 62.0 61.0 44.1 80.5
Conserved Hypothetical Protein (%) 20.3 22.0 23.0 28.0 41.3 12.3
No homology (%) 13.9 11.1 15.0 11.0 14.6 7.2







TIGR categories (%)





Amino acid biosynthesis 3.52 1.65 2.16 1.81 1.01 2.09
Biosynthesis of cofactors, prosthetic groups, and carriers 3.61 2.02 2.57 2.17 1.00 1.85
Cell envelope 3.97 2.54 5.58 6.55 6.73 3.16
Cellular processes 6.16 3.49 6.14 8.01 2.62 3.47
Central intermediary metabolism 3.49 2.08 1.34 1.49 1.64 1.34
DNA metabolism 2.16 1.33 2.01 2.27 2.42 1.87
Energy metabolism 2.97 5.29 7.81 5.47 5.73 6.78
Fatty acid and phospholipid metabolism 1.47 1.27 1.91 1.51 0.73 1.23
Protein fate 2.25 2.17 3.07 3.40 2.40 2.13
Protein synthesis 4.03 1.94 2.25 2.04 2.34 2.24
Purines, pyrimidines, nucleosides, and nucleotides 0.96 0.95 1.10 0.97 0.58 1.41
Regulatory functions 6.97 4.69 9.13 7.45 4.55 3.23
Transcription 2.19 0.73 1.12 0.85 0.65 0.75
Transport and binding proteins 10.43 4.68 11.20 10.03 3.87 5.79
Other categories 5.02 0.15 3.14 6.02 1.38 0.77







*PSAE, Pseudomonas aeruginosa; *PSPU, Pseudomonas putida; *PSSY, Pseudomonas syringae; *RALTO, Ralstonia solanacearum; *ESCO, Escherichia coli. We have used the general distribution of ORF functions from TIGR role category graph (http://www.tigr.org/tigr-scripts/CMR2/GenePieChart.spl) to allow standardized comparison with other sequenced genomes only TIGR categories represented in this six genomes are shown.
Last update on 9 October 2007

© Genoscope - Centre National de Séquençage
2 rue Gaston Crémieux CP5706 91057 Evry cedex
Tél:  (+33) 0 1 60 87 25 00
Fax: (+33) 0 1 60 87 25 14

Home | Overview | Projects | News | Press Panorama | Resources | Contact
Follow-up of the site's activity RSS 2.0 | Site Map | Credits | Copyright