Computational Molecular Biology 2014, Vol. 4, No. 7, 1-17
http://cmb.biopublisher.ca
4
lysosome, nucleus, peroxisome, plasma membrane
and vacuole proteins were predicted by WoLF PSORT.
For proteins predicted as located in mitochondria,
Golgi apparatus, nucleus, and vacuole, if a protein
contains one or more transmembrane domain, it is
further classified as a membrane protein in that
specific subcellular location.
1.3 Database implementation
The data were stored in a relational database using
MySQL hosted in a Linux server. The user interface
and modules to access the data were implemented
using PHP. BLAST utility and community annotation
submission can be accessed from links on the main
user interface at http://proteomics.ysu.edu/secretomes/
fungi2/index.php. The Supplementary Tables and
other data described in the work can be downloaded at
2 Results
2.1 Evaluation of prediction accuracies of protein
subcellular locations
The prediction methods we employed as described
above were based on our previous evaluation of
computational tools (Min, 2010; Meinken and Min,
2012; Melhem et al., 2013). To further estimate the
prediction accuracies of our methods for each
subcellular location in this dataset we retrieved 14884
proteins having an annotated, unique subcellular
location from UniProtKB/Swiss-Prot set. Proteins
having multiple subcellular locations or labeled as
“fragment” were excluded. The prediction accuracies
were measured as the sensitivity, the specificity, and
Matthews correlation coefficient (MCC) based on
formulas used previously (Min, 2010). The accuracy
results are shown in Table 1. The prediction accuracies
from plasma membrane and lysosome were not
included as the numbers of positive proteins were too
few (<20). In comparing with methods using a single
tool, our method - i.e. using a combination of
multiple tools including SignalP 4.0, WoLF PSORT
and Phobius for secretory signal peptide prediction
and PS-Scan for removing ER proteins and
TMHMM for removing membrane proteins -
significantly improved the prediction accuracy for
secretomes (Min, 2010; Meinken and Min, 2012).
For prediction of secretome size in a given species,
the predicted set of highly likely secreted proteins
would provide a relatively accurate estimation as
this method has the highest specificity (>0.99), and
interestingly, the number of false negatives is close
to the number of false positives in the dataset used
for evaluation. Including the predicted likely
secreted protein set into a secretome only slightly
decreased the MCC value as only a small number of
entries belong to this category. However, the
predicted set of weakly likely secreted proteins
needs to be treated with caution as the number of
false positives was far more than the number
decrease of the false negatives (Table 1).
Table 1 Evaluation of prediction accuracies of fungal protein subcellular locations
Subcellular location
True positive
False positive
True negative
False Negative Sn
Sp
MCC
HLS
1364
130
13269
121
0.919
0.990
0.906
HLS+LS
1401
188
13211
84
0.943
0.986
0.902
HLS+LS+WLS
1412
337
13062
73
0.951
0.975
0.862
Mitochondria
1595
887
12015
387
0.805
0.931
0.671
ER
19
11
13873
981
0.019
0.999
0.102
Golgi apparatus
5
2
14527
350
0.014
1.000
0.098
Nucleus
4483
2771
6823
807
0.847
0.711
0.535
Vacuole
0
0
14389
495
0.000
1.000
Peroxisome
9
15
14722
138
0.061
0.999
0.148
Cytoplasm
1293
762
10611
2218
0.368
0.933
0.371
Cytoskeleton
87
234
14055
508
0.146
0.984
0.175
Note: HLS: highly likely secreted; LS: likely secreted; WLS: weakly likely secreted; ER: Endoplasmic reticulum; Sn: sensitivity;
Sp:specificity; MCC: Matthews correlation coefficient.