CMB-2016v6n4 - page 7

Computational Molecular Biology 2016, Vol.6, No.4, 1-12
4
Table 1 Prediction accuracy evaluation of protist protein subcellular locations
TP
FP
TN
FN
Sn
Sp
MCC
(a) Mitochondrial proteins
TargetP
136
89
1823
359
27.5
95.3
0.32
WoLF PSORT
278
133
1779
1877
217
56.2
93.0
0.53
TargetP AND WoLF PSORT
118
35
377
23.8
98.2
0.36
TargetP OR WoLF PSORT
296
188
1724
199
59.8
90.2
0.50
(b) Secreted proteins
Secreted
99
50
2211
47
67.8
97.8
0.65
S + HLS
130
85
2176
16
89.0
96.2
0.71
S + HLS + LS
137
121
2140
9
93.8
94.6
0.68
S+ HLS + LS + WLS
138
280
1981
8
94.5
87.6
0.52
(c) Other subcellular locations
Cytoplasm
322
167
1714
204
61.2
91.1
0.54
Cytoskeleton
62
13
2180
152
29.0
99.4
0.46
ER
12
26
2254
115
9.4
98.9
0.15
Golgi
0
4
2345
58
0.0
99.8
0.00
Lysosome
0
0
2379
28
0.0
100.0
Nucleus
466
514
1348
79
85.5
72.4
0.49
Peroxisome
1
2
2381
23
4.2
99.9
0.11
Plasma membrane
18
149
2046
194
8.5
93.2
0.02
Vacuole
0
0
2375
32
0.0
100.0
Note: TP: true positives; FP: false positives; TN: true negatives; FN: false negatives. Sn: sensitivity; Sp: specificity; MCC: Matthews
correlation coefficient. Secreted: predicted by 4 predictors; HLS: highly likely secreted, predicted by 3 out of 4 predictors; LS: likely
secreted, predicted by 2 out of 4 predictors; WLS: weakly likely secreted, predicted by 1 out of 4 predictors
3.1.2 Secreted proteins
Our previous evaluation showed that secreted protein prediction accuracy can be improved by removing
transmembrane proteins and ER resident proteins (Min, 2010). As we employed four tools - SignalP, TargetP,
WoLF PSORT, and Phobius - for predicting secreted proteins or secretory signal peptides, we had to determine
which should be included in the secretome set. After removing transmembrane proteins and ER proteins, the
protein set predicted to be secreted are divided into four categories: (1) Secreted: predicted by 4 predictors; (b)
Highly likely secreted (HLS): predicted by 3 out of 4 predictors; (3) Likely secreted (LS): predicted by 2 out of 4
predictors; and (4) Weakly likely secreted (WLS): predicted by 1 out of 4 predictors. The dataset consisted of 146
curated secreted proteins as positives and 2,261 proteins located in other subcellular locations as negatives. The
accuracy results are shown in Table 1b.
As expected, when only entries were predicted by all four tools to be positives as true positives, the prediction
specificity was highest. However, the sensitivity was lowest. On the other hand, when including all entries
predicted by any of the four tools to be positives as true positives, the prediction specificity was decreased while
the sensitivity was increased. Based on the MCC values, the most accurate prediction (0.71) for a secretome
includes secreted entries predicted by at least three out of four predictors with a specificity of 96.2% and a
sensitivity of 89.0% (Table 1b). Thus, we recommend including only curated secreted proteins and highly likely
secreted proteins for estimating the secretome size for a species. It should be noted that both entries predicted by 4
of 4 tools and 3 of 4 tools were assigned to the category of highly like secreted in the database.
3.1.3 Proteins in other subcellular locations
The prediction accuracy results for proteins located in cytoplasm, cytoskeleton, ER, Golgi apparatus, lysosome,
1,2,3,4,5,6 8,9,10,11,12,13,14,15,16
Powered by FlippingBook