4 - CMB 1371页

Computational Molecular Biology

2014, Vol.4, No.4, 34-43 http://cmb.biopublisher.ca

Research Report

Open Access

In silico Proteomic Functional Re-annotation of

Escherichia coli

K-12 Using

Dynamic Biological Data Fusion Strategy

Gopal Ramesh Kumar , Thankaswamy Kosalai Subazini , Chinnasamy Perumal Rajadurai , Kandavel Palani

Kannan

Bioinformatics Lab, AU-KBC Research Centre, M.I.T Campus of Anna University, Chromepet, Chennai 600044, India

Corresponding Author email: gramesh@au-kbc.org;

Author

Computational Molecular Biology, 2014, Vol.4, No.4 doi: 10.5376/cmb.2014.04.0004

Copyright

use, distribution, and reproduction in any medium, provided the original work is properly cited.

Abstract

Escherichia coli

, one of the favorite model organisms, was initially annotated in 1997 and re-annotated in 2007. Although

years of intensive research is being carried out on

E. coli

genome, still complete and accurate functional information of this organism

is not available. In

E. coli

, about 40% of the protein sequences have been annotated as hypothetical proteins, because of lack of

information. Hence, such sequences require advanced computational strategies and derive clues on their biological role. Herein, we

have carried out re-annotation of the complete proteome of

E. coli

K-12 using “Dynamic biological data fusion method”. It is a

computational strategy we typically applied for combining the heterogeneous biological data sources to maximize knowledge sharing

and generating the intersection of data sets. Functional re-annotation results reported in this paper help us to present high quality data

on complete proteome of

E. coli

K-12. We have updated all the protein coding genes from previous annotation work and tried to

assign new or more precise functions, wherever possible. About 29% of the protein sequences of

E. coli

which have been previously

annotated as unclear/unknown (hypothetical; without functions) have now been assigned with clear/known functions. Further, the

analysis also resulted in the revision of the protein sequences that have been found to be false positive or poorly annotated.

Information from this work is made available as a database, “REC-DB", which will remain a useful repository with accurate and

updated functional information. Availability: REC-DB is publicly available at http://recdb.bioinfo.au-kbc.org.in/recdb/

Keywords

E. coli

; Re-annotation; Hypothetical proteins; Confidence level; Phylogenetics; Motif

Background

The field of genomics has been expanding at a rapid

pace since the annotated Escherichia coli K-12

genome was published in 1997 (Blattner et al., 1997).

This has led to exponential growth of sequence

information and related biological databases (Serres et

al, 2001). Despite decades of intense research on the E.

coli genome with the attributions through the

biochemical experimentations, complete and accurate

functional information of this model organism is still

not available. Globally several genome projects have

been completed and many of them are enduring. There

are incomplete functional annotation results based on

obsolete data or inappropriate sequence models.

Moreover, this information is not updated for years

and such a poor annotation will lead to the significant

gaps in our genome knowledge (Salzberg, 2007). This

incomplete annotation process causes an extensive

occurrence of unknown proteins in their genomes that

have not yet been characterized. The traditional

functional enrichment is by BLAST analysis carried

out for the entire genome and in most of the case it is

incomplete because functions are not updated

frequently. So it is necessary to frequently update the

genome function through re-annotation, otherwise the

information provided will be obsolete. The occurrence

of several uncharacterized proteins in the genomes

such as hypothetical or conserved hypothetical

proteins and functions with negative terms such as

possible, probable, etc leading to uncertainty. In

general for any microbial genome it will be around

30~40%. The functions of unknown proteins like

Preferred citation for this article:

Kumar et al., 2014, In silico Proteomic Functional Re-annotation of Escherichia coli K-12 Using Dynamic Biological Data Fusion Strategy, Computational

Molecular Biology, Vol.4, No.4 34-43 (doi: 10.5376/cmb.2014.04.0004)

Received: 15 Apr., 2014

|

Accepted: 10 May, 2014

|

Published: 05 Jul., 2014

Computational

Molecular Biology