All the versions of this article:
Explore the genome:
Sequence annotation and comparative genomics
The first set of potential CoDing Sequences (CDSs) was identified using the AMIGene software (Annotation of MIcrobial Genes ; (Bocs et al., 2003)) trained with a set of CDS larger than 500 bp from the genomic sequence. The multivariate statistical technique of Factorial Correspondence Analysis (FCA) and clustering methods were then applied to this set of predicted CDSs in order to derive multiple models taking into account the compositional diversity of genes within the Acinetobacter ADP1 genome. Three gene models were then subsequently used together in the core of AMIGene, the minimum length of the CDSs being equal to 60 bp. This second set of putative genes (made of 3683 CDSs) were submited to functional annotation :
Each predicted gene was assigned a unique numeric identifier prefixed with “ACIAD”. The first CDS from the origin of replication, the putative dnaA gene, was assigned as ACIAD0001, and each following CDS was numbered consecutively in a clockwise direction. Manual validation of the automatic annotation was performed using the interface MaGe (Magnifying Genomes). Translational start codon were corrected based on protein homology, proximity of ribosome-binding site, and relative positions to predicted signal peptide if so. For this purpose, the useful Artemis sequence Viewer (Rutherford et al., 2000) has been used.
A total of one hundred and fourty eight sequenced genomes were used in the analyses. Orthologs between Acinetobacter ADP1 and the 148 other genomes, were defined as genes showing a minimum of 30% identity and a ratio of 0.8 of the length of the smallest protein. Orthology relations were strengthened by synteny detection (i.e, conservation of the chromosomal co-localisation between pairs of orthologous genes from different genomes) using the Syntonizer software. Our method is not restricted to the bi-directional best-hits definition, and thus allows for multiple correspondences (gene fusion/fission, duplication). A “gap” parameter, representing the maximum number of consecutive genes not involved in a synteny group, has been set to 5 genes.
All these informations (i.e, syntaxic and functional annotations, and results of comparative analysis), are managed in a relational database (using MySQL SGBDR software). Manual validation of the automatic annotation was performed using our web interface MaGe, which allow to graphically visualize both the Acinetobacter annotations and the maps of the synteny groups.
|Table: Comparison of general features and functionnal categories between Acinetobacter ADP1 and selected free-living organisms.|
|Known and putative proteins (%)||62.6||65.9||62.0||61.0||44.1||80.5|
|Conserved Hypothetical Protein (%)||20.3||22.0||23.0||28.0||41.3||12.3|
|No homology (%)||13.9||11.1||15.0||11.0||14.6||7.2|
|TIGR categories (%)|
|Amino acid biosynthesis||3.52||1.65||2.16||1.81||1.01||2.09|
|Biosynthesis of cofactors, prosthetic groups, and carriers||3.61||2.02||2.57||2.17||1.00||1.85|
|Central intermediary metabolism||3.49||2.08||1.34||1.49||1.64||1.34|
|Fatty acid and phospholipid metabolism||1.47||1.27||1.91||1.51||0.73||1.23|
|Purines, pyrimidines, nucleosides, and nucleotides||0.96||0.95||1.10||0.97||0.58||1.41|
|Transport and binding proteins||10.43||4.68||11.20||10.03||3.87||5.79|
|*PSAE, Pseudomonas aeruginosa; *PSPU, Pseudomonas putida; *PSSY, Pseudomonas syringae; *RALTO, Ralstonia solanacearum; *ESCO, Escherichia coli. We have used the general distribution of ORF functions from TIGR role category graph (http://www.tigr.org/tigr-scripts/CMR2/GenePieChart.spl) to allow standardized comparison with other sequenced genomes only TIGR categories represented in this six genomes are shown.|