8 - CMB 1371页

基本HTML版本

I
n silico Proteomic Functional Re-annotation
Escherichia coli
K-12 using Dynamic Biological Data Fusion Strategy
38
such types of proteins can be viewed in REC-DB
database. The updated functional information of the
proteins, in turn, will help the researchers to develop
deeper insights into the molecular systems. The
complete re-annotated functional information is
available as supplementary Table S2. From REC-DB
search, we could able to retrieve the gene information
with clearly annotated function (Supplementary Table
S2, A) which was obtained from the re-annotation
results. Next, hypothetical gene hits can be obtained
which means that there were no functional predictions
obtained for the unknown gene sequences (available
as Supplementary Table S2, B) and the functional hits
with only predicted functions which need to be further
annotated (Supplementary Table S2, C).
1.4 Inconvenient Outcomes
Although our re-annotation was found to be efficient
in updating the functional information of the
E. coli
genome, there were a few inconveniences that had
occurred in the results and these difficulties are listed
below.
1.4.1 Transitive Catastrophe
Transitive catastrophe is a phenomenon whereby a
function is transferred to another on the basis of
sequence similarity searches but the original name is
incorrect (Salzberg, 2007). As more genomes are
annotated and several BLAST searches are carried out,
the functional representation of some protein
sequences gets transferred from one to another
function. It is well known that in the genomic data
resources thousands of such transitive errors have
propagated through sequence databases. Thus, in the
case of such incorrectly annotated information being
propagated through the sequence databases using
which re-annotation was carried out, then transitive
catastrophes, leading to false positive functional
predictions could have appeared.
These
inconveniences remained difficult to handle and were
unable to make critical decisions upon them. Hence,
such hassles are left open to the scientific community
or any expert for handling and suggestions.
1.5 REC-DB
The outcome of this research work has been published
online as a public database named “REC-DB – A
Re-annotated
Escherichia coli
Database”. Several
enhanced features have been incorporated within this
database for searching functions. In this database, user
can able to retrieve the re-annotated
E. coli
genome
data by querying REC-DB accession number (eg.
ec001), by choosing GenBank id (GI.No. 90111633)
or by giving Gene id (Gene-ID. 948195). While
querying, user may find “Null”, “No GI” and “No
Gene id” in search option which actually means that
there are no REC-DB function if it is queried as
“Null” and there are no GenBank id, if it is searched
as “No GI” and no gene id occurs in REC-DB, if it is
search as “No Gene id”.
2 Discussion
Although genome projects have the potential to
provide a better understanding of the organisms, the
lack of updated and accurate functional annotation for
the genome hampers the ability to exploit these data
for any further research on the organism. Hence in this
in silico functional proteomic re-annotation, an
attempt has been made to substantially update the
functions of the entire sequences of
E. coli
K-12,
incorporating a vast amount of research information
performed since the original annotation in 1997. Much
knowledge has been gained about the molecular
functions encoded by the
E. coli
K-12 genome.
Analyzing a single sequence using a regular BLAST
program (http://www.ebi.ac.uk/Tools/BLAST/), will
itself generate large amount of results in terms of hits
accompanied with varied parameters such as E-value,
Percentage of Identity, Percentage of Similarity,
BLAST score and sequence length. The results
obtained from BLAST with a maximum alignment
score and optimal E-value of 1×10-6 up to 1×10 -52
can be obtained as a result hit (Gabriel et al., 2008).
This requires a lot of human interventions to interpret
and choose the best positive hit. Thus, analyzing the
entire proteome of
E. coli
using a regular BLAST
program will be tedious (Aravindhan et al., 2009;
Hulo et al., 2004). AIM-BLAST with a well structured
and in a concise manner, supported us greatly in
performing sequence comparisons of the complete
genome of
E. coli
efficiently and in a very short span
of time (Aravindhan et al. 2009).
Computational
Molecular Biology