Comparison of the microbial genes annotated in GenBank files with the CDSs predicted by the AMIGA strategy

Number of CDSs Annotated GNF (% vs OA) Potential NG (% vs AP)
Species Length G+C OA AP CC (%) Total Pc Status Pc Total Pc Status
code (Mb) (%)
(OA -CC) <0.2 WRONG SUSPI >=0.4 (AP-CC) >=0.4 NEW AMBIG
AERPE 1.67 56 2694 1721 1545 (57.3) 42.65 42.24 33.96 4.94 0.00 10.23 8.08 2.56 5.35
AQUAE 1.55 43 1522 1713 1511 (99.3) 0.72 0.72 0.13 0.13 0.00 11.79 10.57 5.02 5.55
ARCFU 2.18 49 2436 2459 2360 (96.9) 3.12 2.67 0.78 0.94 0.04 4.03 2.11 0.85 1.22
BORBU 0.91 29 851 830 797 (93.7) 6.35 5.99 3.06 1.53 0.00 3.98 2.53 0.48 1.81
CAMJE 1.64 31 1647 1620 1617 (98.2) 1.82 1.52 0.55 0.49 0.00 0.19 0.19 0.00 0.19
CHLPN 1.23 41 1074 1065 1024 (95.3) 4.66 4.56 0.09 0.00 0.00 3.85 2.25 0.56 1.60
CHLTR 1.04 41 893 909 870 (97.4) 2.58 2.24 0.34 0.56 0.00 4.29 2.09 0.77 1.21
ECOLI 4.63 51 4289 4100 3959 (92.3) 7.69 7.48 1.42 2.12 0.00 3.44 1.80 0.73 1.02
HAEIN 1.83 38 1737 1765 1721 (99.1) 0.92 0.75 0.40 0.17 0.00 2.49 1.30 0.51 0.79
HELPJ 1.64 39 1482 1493 1447 (97.6) 2.36 2.09 0.13 0.20 0.00 3.08 2.08 0.40 1.54
HELPY 1.66 39 1588 1567 1514 (95.3) 4.66 4.28 1.89 0.44 0.00 3.38 1.91 0.70 1.08
METJA 1.66 31 1723 1766 1705 (99.0) 1.04 0.99 0.12 0.70 0.00 3.45 2.38 0.85 1.53
METTH 1.75 50 1869 1841 1793 (95.9) 4.07 4.01 1.82 1.02 0.00 2.61 1.36 0.16 1.20
MYCGE 0.58 32 483 550 474 (98.1) 1.86 1.86 0.41 0.00 0.00 13.82 8.55 6.00 2.18
MYCPN 0.81 40 688 805 664 (96.5) 3.49 3.34 0.29 0.87 0.00 17.52 11.80 4.60 6.96
MYCTU 4.41 66 3913 4096 3746 (95.7) 4.27 3.71 0.64 1.61 0.08 8.54 3.83 1.32 2.34
NEIMA 2.18 52 2063 1908 1802 (87.3) 12.65 11.63 3.64 1.65 0.00 5.56 2.10 0.63 1.31
NEIMB 2.27 52 2128 1960 1810 (85.1) 14.94 14.47 5.83 2.16 0.00 7.65 4.39 2.55 1.63
PYRAB 1.76 45 1764 1856 1706 (96.7) 3.29 3.29 0.00 0.40 0.00 8.08 4.85 1.45 3.39
PYRHO 1.74 42 2059 1813 1643 (79.8) 20.20 19.86 14.67 1.31 0.00 9.38 5.90 2.76 3.09
RICPR 1.10 29 834 886 818 (98.1) 1.92 1.68 0.72 0.60 0.00 7.67 5.76 2.37 3.39
SYNY3 3.57 48 3163 3111 2965 (93.7) 6.26 6.01 0.03 1.20 0.00 4.69 2.06 0.51 1.51
THEMA 1.86 46 1872 1876 1816 (97.0) 2.99 2.78 1.12 0.85 0.00 3.20 1.39 0.48 0.85
TREPA 1.14 53 1040 1034 964 (92.7) 7.31 6.92 3.56 2.40 0.00 6.77 3.00 0.68 2.03
UREPA 0.75 25 612 608 589 (96.2) 3.76 3.59 0.49 1.14 0.00 3.13 1.97 0.00 1.97
VIBCH 4.03 47 3882 3857 3568 (91.9) 8.09 5.51 2.32 2.01 0.00 7.49 3.37 0.21 3.09
See the publication for details on shaded cells. Abbreviations: CDSs, coding sequences; OA, original annotation; AP, AMIGA prediction; CC, common CDSs to both OA and AP; GNF, gene not found; NG, new gene; Pc, coding average probability of a CDS; SUSPI, suspicious; AMBIG, ambiguous. See Methods for species abbreviations.