Logo WARNING!!! Logo

It is highly recommended to use GenBank/RefSeq annotation files as input !!!


The section Annotated Databank File allows the user to choose the input databank file to be analyzed. The input can be a native file directly downloaded from EMBL or GenBank web sites, or an EMBL file generated by the Artemis software.
The parsing step takes into account heterogeneity in databank annotations, and genes annotated as authentic frameshifts (or point mutation) in order to avoid false positive predictions in regions of the genome containing already identified frameshifts. Very often, these regions are annotated using the gene feature only (i.e, at the TIGR center), OR using /pseudo in the CDS feature (i.e, at the Sanger center).

While annotations of the GenBank files are always described with both the gene and CDS features, we have noticed that in the EMBL file format, the gene feature is used only in thirteen cases. Consequently, frameshifted genes being annotated only with the gene feature are missing in the whole annotation of the EMBL file and thus appears as new AMIGene prediction. For example, in the Shewanella oneidensis genome more than 70 genes are supposed to be frameshifted.

In the Genbank file (AE014299) these genes are annotated as follow :

	gene 1250687..1251130
	/gene="rbfA"
	/note="SO1205"
	CDS 1250687..1251130
	/gene="rbfA"
	/note="similar to GB:M94547, and PID:189011; identified
	by sequence similarity; putative"
	/codon_start=1
	/transl_table=11
	/product="ribosome-binding factor A"
	/protein_id="AAN54275.1"
	/db_xref="GI:24346873"
	/translation="MAKEFSRTRRIGQQLQQELAVVLQRDMKDPRIGFVTVNDVDVSR
	DLSYAKVFVTFFEEDKDVVQEKLNALIAAAPYIRTLVAGRMKLRVMPEIRFVYDSSLV
	EGMRMSNLVSQVINSDKAKQQQFGSVDDDVIENDIEESDDTEGKV"
	gene 1251130..1252083
	/gene="truB"
	/note="SO1206; This region contains a pseudogene, one or
	more premature stops, and is not the result of a
	sequencing artifact; tRNA pseudouridine synthase B;
	identified by match to TIGR protein family HMM TIGR01465"
	gene 1252211..1252480
	/gene="rpsO"
	/note="SO1207"
	CDS 1252211..1252480
	/gene="rpsO"
	/note="identified by match to PFAM protein family HMM
	PF00312"
	/codon_start=1
	/transl_table=11
	/product="ribosomal protein S15"
	/protein_id="AAN54276.1"
	/db_xref="GI:24346874"
	/translation="MSLSTEAKAKILAEFGRGANDTGSTEVQVALLTAQINHLQDHFK
	EHIHDHHSRRGLLRMVSARRKLLAYLKRTEAVRYNELIQKLGLRR"
In the EMBL file, the corresponding gene is missing :

	FT CDS 1250687..1251130
	FT /codon_start=1
	FT /db_xref="GOA:Q8EHL4"
	FT /db_xref="Swiss-Prot:Q8EHL4"
	FT /note="similar to GB:M94547, and PID:189011; identified by
	FT sequence similarity; putative"
	FT /transl_table=11
	FT /gene="rbfA"
	FT /product="ribosome-binding factor A"
	FT /protein_id="AAN54275.1"
	FT /translation="MAKEFSRTRRIGQQLQQELAVVLQRDMKDPRIGFVTVNDVDVSRD
	FT LSYAKVFVTFFEEDKDVVQEKLNALIAAAPYIRTLVAGRMKLRVMPEIRFVYDSSLVEG
	FT MRMSNLVSQVINSDKAKQQQFGSVDDDVIENDIEESDDTEGKV"
	FT CDS 1252211..1252480
	FT /codon_start=1
	FT /db_xref="GOA:Q8EHL3"
	FT /db_xref="TrEMBL:Q8EHL3"
	FT /note="identified by match to PFAM protein family HMM
	FT PF00312"
	FT /transl_table=11
	FT /gene="rpsO"
	FT /product="ribosomal protein S15"
	FT /protein_id="AAN54276.1"
	FT /translation="MSLSTEAKAKILAEFGRGANDTGSTEVQVALLTAQINHLQDHFKE
	FT HIHDHHSRRGLLRMVSARRKLLAYLKRTEAVRYNELIQKLGLRR"
	FT CDS complement(1252572..1254626)
	FT /codon_start=1
	FT /db_xref="GOA:Q8EHL2"
	FT /db_xref="TrEMBL:Q8EHL2"
	FT /note="identified by match to PFAM protein family HMM
	FT PF00672"
	FT /transl_table=11
	FT /gene="SO1208"
	FT /product="GGDEF domain protein"
	FT /protein_id="AAN54277.1"
	FT /translation="MTGRFKSLTWKQTNLVVFTALFFAIAIFIVEIALVVVSTKQQLTT
	FT TQQELLDSVEQPAANAVWALDDNLARQTLEGAIKVEHVGSAVIELDDGSMFVSVSNNRA
	FT NNSQTFISLSNQLFDDLKEISRPLYRPFYFEGTQKQQLIGTLTIFYDTQELTNTLFSQL
...
//