CMB-2016v6n4 - page 6

Computational Molecular Biology 2016, Vol.6, No.4, 1-12
3
also classified as a membrane protein, then it is further classified as a mitochondrial membrane protein.
ER proteins:
Proteins predicted to contain a signal peptide by SignalP 4.0 and an ER target signal (Prosite:
PS00014) by PS-Scan were treated as luminal ER proteins.
Secretomes:
A secretome is all secreted proteins from a species. There were four subcategories of secreted
proteins. Curated secreted proteins include proteins which are annotated to be “secreted” or “extracellular” or
“cell wall” in the subcellular location from the UniProtKB/Swiss-Prot data set which are “reviewed” as well as
manually collected secreted proteins from recent literature by our curators. “Highly likely secreted” proteins are
predicted to have a secretory signal peptide by at least three of the four predictors including SignalP 4.0, Phobius,
TargetP and WoLF PSORT, but are not classified as any of the above categories. “Likely secreted” proteins are
predicted to have a secretory signal peptide by two of the four predictors, and “Weakly likely secreted” proteins
are predicted to have a secretory signal peptide by one of the four predictors. We recommend combining both
curated and highly likely secreted proteins as a secretome for a species (see accuracy evaluation section).
Proteins in other subcellular locations:
Other subcellular locations - including cytosol (cytoplasm), cytoskeleton,
Golgi apparatus, lysosome, nucleus, peroxisome, plasma membrane and vacuole - were predicted by WoLF
PSORT. It should be noted that we did not predict the category of plastid proteins and all entries in this category
were from UniProtKB curation.
2.3 Prediction accuracy evaluation of protein subcellular locations
The prediction tools we chose above were based on our previous evaluation (Min, 2010). To further evaluate the
prediction accuracy of each subcellular location in this dataset, we retrieved protein entries having an annotated,
unique subcellular location from UniProtKB/Swiss-Prot dataset. Proteins having multiple subcellular locations,
labeled as “fragment”, not starting with “M”, or having a length < 70 amino acids were excluded. Proteins with
a subcellular location having a term including “By similarity”, “Probable”, or “Potential” were excluded. The
prediction accuracy for each subcellular location was evaluated using prediction sensitivity (Equation 1),
specificity (Equation 2) and Matthews Correlation Coefficient (MCC) (Equation 3).
Sensitivity (%) = TP/(TP + FN) x 100 (1)
Specificity (%) = TN/(TN + FP) x 100 (2)
MCC (%) = (TP x TN – FP x FN) x 100 /((TP + FP) (TP + FN) (TN + FP) (TN + FN))1/2 (3)
TP is the number of true positives, FN is the number of false negatives, FP is the number of false positives, and
TN is the number of true negatives. The MCC takes into account true and false positives and negatives and is
generally regarded as a balanced measure, with +1 representing a perfect prediction and 0 meaning no better than
random chance (Matthews, 1975). The dataset contains a total of 2,407 proteins. For each category, the number of
actual positives equals TP plus FN and the number of actual negatives equals FP plus TN (Table 1).
3 Results
3.1 Prediction accuracy
3.1.1 Mitochondrial proteins
The prediction accuracy results for each subcellular location are shown in Table 1. As both TargetP and WoLF
PSORT can predict mitochondrial proteins, we evaluated the prediction accuracy of these two tools both
individually and combined (Table 1a). When an individual tool was used, WoLF PSORT prediction showed a
much higher sensitivity but a slightly lower specificity than TargetP prediction. Thus, the MCC value was higher
using WoLF PSORT (0.53) than using TargetP (0.32). If only positives predicted by both tools were used, the
specificity was slightly increased but the sensitivity decreased. In contrast, including positives predicted by either
tool increased the sensitivity but decreased the specificity resulting in a lower MCC value (0.50) than using WoLF
PSORT alone. Thus, we based our predictions for mitochondrial proteins on WoLF PSORT alone.
1,2,3,4,5 7,8,9,10,11,12,13,14,15,...16
Powered by FlippingBook