7 - CMB 1371页

基本HTML版本

Computational Molecular Biology
37
1.1 Assigning functions to uncharacterized proteins
Out of 1,730 unknown protein sequences, from the
previous annotations, 1,239 sequences have now been
assigned with clear functions. From the re-annotation
results, it was found that about 156 proteins functions
were not clearly defined. For example (Table 2), the
re-annotation of the protein sequence ec2389 gave
different functions from different tools viz.,
‘metallo-beta lactomase superfamily’ from Pfam,
‘Zinc dependent Hydrolases’ from COG, ‘Probable
Hydrolase’ from ProDom and No Hits from BLAST
and ScanProsite. Of these Pfam and ProDom
produced similar function but Pfam gave a different
function. Note that ProDom gave a function with the
negative term “Probable”, highlighted in Table 2, and
hence not considered for the analysis. Similarly, for
Protein Sequence 4267, Pfam and ScanProsite
produced “Thiolase”, as function, but other tools
produced “Acetyl Transferases” as function. For some
proteins, though different functions, few functional
contexts were seemed to be synonymous, and others
were not clearly defined. In such cases, it has
practically become difficult to make a decision upon
the functions. To decide properly among them, further
advanced analysis strategies must be devised or the
biochemical experimentations must be carried out.
The two sample problem sets of sequences with
different functions received from different tools
that are made by manual annotation is shown in
Table 2.
Table 2 Sample problem sets after re-annotation
Results
Sequence ID 2389
a
Sequence ID 4267
a
PFAM
Metallo-beta lactomase
b
Thiolase
b
COG
Zinc dependent Hydrolases
b
Acetyl CoA Acetyl transferases
b
SCANPROSITE
No Hits
Thiolase enzymes
b
BLAST
No hits
Acetyl CoA Acetyl transferases
b
PRODOM
Probable Hydrolase
c
Probable Acetyl Transferase
c
Note:
a
Sequence ID in Rec-DB database;
b
Different functions from different tools;
c
Functions predicted with a negative term
“Probable”
For example, the conserved hypothetical protein
encoded by the sequence ec1270 has now been
predicted as Endoribonuclease and the complete list of
data is now available at REC-DB database. Before
annotation, there were 1730 (40% of sequences),
unknown sequences and as a result of re-annotation it
was reduced to 491 protein sequences (Figure 2b).
This shows that only 11% of the proteins were left
unknown/without function. Thus an overall outcome
of re-annotation was found to be 29% efficient in
analyzing unknown sequences of
E .coli
.
1.2 Transfer of functions (Revised functions)
Re-analysis of incompletely annotated sequences
resulted in transferring the already available functions
to the new and accurate functions. For example
(Supplementary Table S1), protein sequence ec1034
was originally been annotated as MEND-MONOMER
MenD (complement (2377281-2375611)) but now it
was reassigned as 2-succinyl-6-hydroxy-2,
4-cyclohexadiene-1-carboxylate synthase.
Hence,
such types of functional transfers will be much useful
and facilitate the scientific community to work with
the more precise and reliable functions. The complete
list of data is available in REC-DB database
(http://recdb.bioinfo.au-kbc.org.in/recdb/).
1.3 Updates of protein functions
Re-annotation of
E. coli
has also resulted in updating
the previously annotated functions based on the
available biological information in the databases. In
original annotation, a few proteins were designated
with certain general function that was not considered
to be adequate for understanding their actual
biological roles. Our re-annotation has helped in
adding more elaborate functions to such types of
protein sequences. For example (Supplementary Table
S1), protein sequence ec1915 was previously given a
function, ‘reductase’, which is not sufficient for the
functional annotation. But our re-annotation work
helped us to update this function as ‘Oxidoreductase
molybdopterin binding domain’. Full list of data of
Computational
Molecular Biology