Computational Molecular Biology
2014, Vol.4, No.4, 34-43 http://cmb.biopublisher.ca
Research Report
Open Access
In silico Proteomic Functional Re-annotation of
Escherichia coli
K-12 Using
Dynamic Biological Data Fusion Strategy
Gopal Ramesh Kumar , Thankaswamy Kosalai Subazini , Chinnasamy Perumal Rajadurai , Kandavel Palani
Kannan
Bioinformatics Lab, AU-KBC Research Centre, M.I.T Campus of Anna University, Chromepet, Chennai 600044, India
Corresponding Author email: gramesh@au-kbc.org;
Author
Computational Molecular Biology, 2014, Vol.4, No.4 doi: 10.5376/cmb.2014.04.0004
Copyright
© 2014 Kumar et al. This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted
use, distribution, and reproduction in any medium, provided the original work is properly cited.
Abstract
Escherichia coli
, one of the favorite model organisms, was initially annotated in 1997 and re-annotated in 2007. Although
years of intensive research is being carried out on
E. coli
genome, still complete and accurate functional information of this organism
is not available. In
E. coli
, about 40% of the protein sequences have been annotated as hypothetical proteins, because of lack of
information. Hence, such sequences require advanced computational strategies and derive clues on their biological role. Herein, we
have carried out re-annotation of the complete proteome of
E. coli
K-12 using “Dynamic biological data fusion method”. It is a
computational strategy we typically applied for combining the heterogeneous biological data sources to maximize knowledge sharing
and generating the intersection of data sets. Functional re-annotation results reported in this paper help us to present high quality data
on complete proteome of
E. coli
K-12. We have updated all the protein coding genes from previous annotation work and tried to
assign new or more precise functions, wherever possible. About 29% of the protein sequences of
E. coli
which have been previously
annotated as unclear/unknown (hypothetical; without functions) have now been assigned with clear/known functions. Further, the
analysis also resulted in the revision of the protein sequences that have been found to be false positive or poorly annotated.
Information from this work is made available as a database, “REC-DB", which will remain a useful repository with accurate and
updated functional information. Availability: REC-DB is publicly available at http://recdb.bioinfo.au-kbc.org.in/recdb/
Keywords
E. coli
; Re-annotation; Hypothetical proteins; Confidence level; Phylogenetics; Motif
Background
The field of genomics has been expanding at a rapid
pace since the annotated Escherichia coli K-12
genome was published in 1997 (Blattner et al., 1997).
This has led to exponential growth of sequence
information and related biological databases (Serres et
al, 2001). Despite decades of intense research on the E.
coli genome with the attributions through the
biochemical experimentations, complete and accurate
functional information of this model organism is still
not available. Globally several genome projects have
been completed and many of them are enduring. There
are incomplete functional annotation results based on
obsolete data or inappropriate sequence models.
Moreover, this information is not updated for years
and such a poor annotation will lead to the significant
gaps in our genome knowledge (Salzberg, 2007). This
incomplete annotation process causes an extensive
occurrence of unknown proteins in their genomes that
have not yet been characterized. The traditional
functional enrichment is by BLAST analysis carried
out for the entire genome and in most of the case it is
incomplete because functions are not updated
frequently. So it is necessary to frequently update the
genome function through re-annotation, otherwise the
information provided will be obsolete. The occurrence
of several uncharacterized proteins in the genomes
such as hypothetical or conserved hypothetical
proteins and functions with negative terms such as
possible, probable, etc leading to uncertainty. In
general for any microbial genome it will be around
30~40%. The functions of unknown proteins like
Preferred citation for this article:
Kumar et al., 2014, In silico Proteomic Functional Re-annotation of Escherichia coli K-12 Using Dynamic Biological Data Fusion Strategy, Computational
Molecular Biology, Vol.4, No.4 34-43 (doi: 10.5376/cmb.2014.04.0004)
Received: 15 Apr., 2014
|
Accepted: 10 May, 2014
|
Published: 05 Jul., 2014
Computational
Molecular Biology