PlantSecKB: the Plant Secretome and Subcellular Proteome KnowledgeBase
11
or by us were considered first in assigning a
subcellular location, these assignments are based on
traceable literature with experimental evidence, and
thus fairly reliable. However, the subcellular locations
assigned based on the computational prediction will
depend on the accuracy of the tools used. We have
evaluated the prediction accuracy of the methods we
used in this study and compared it with the accuracies
of other methods (Table 1) (Min, 2010; Meinken and
Min, 2012). We concluded the prediction of secreted
proteins is relatively reliable. However, false positives
and false negatives certainly exist. For example, a
number of P450 enzymes were predicted to be secreted
proteins, which are most likely false positives.
We also predicted other subcellular locations
including mitochondrial, chloroplast, vacuole, nucleus,
and others based on the predictions of TargetP and
WoLF PSORT. Our evaluation on the prediction
accuracies of these subcellular locations revealed that
the accuracies of the tools we used, even though they
are best among available tools, are still not
satisfactory due to relatively low prediction
sensitivities for these subcellular locations (Table 1)
(Meinken and Min, 2013). With the exception of
mitochondrial and cytosol proteins, however, the
specificities for those subcellular locations including
chloroplast, ER, Golgi apparatus, nucleus, plasma
membrane, vacuole and cytoskeleton are acceptable
(>89%). Thus, proteins predicted in those subcellular
locations are relatively reliable, though they still need
to be cautiously examined with experiments.
Recently, several new tools were developed
including the Cell-PLoc servers (Chou and Shen,
2008), MultiLoc2 (Blum et al., 2009), and others
(Meinken and Min, 2012). These tools and their
related publications can be found at our website
(http://proteomics.ysu.edu/tools/subcell.html) (Meinken
and Min, 2012). As standalone tools are not available
for some of them, such as Cell-PLoc, or some
standalone tools are too slow for processing a large
data set, such as MutliLoc2, we were not able to use
them for our data processing. However, we suggest
users utilize these tools to get a second prediction for
proteins of interest as our experience showed that
using multiple tools improves prediction specificity.
Based on several recent large-scale secretome studies
in plants, non-classical, i.e. leadless secretory proteins
(LSPs) were observed to account for more than 50%
of the total identified secretome, supporting the
existence of novel secretory mechanisms independent
of the classical ER-Golgi secretory pathway
(Agrawal et al., 2010 for review; Jung et al., 2008;
Cheng and Williamson, 2010; Ding et al., 2012).
Mammalian and bacterial LSPs have been
collected and used to implement the prediction
software, SecretomeP, for predicting these proteins
(http://www.cbs.dtu.dk/services/SecretomeP/) (Bendtsen
et al., 2004a). Because the tool has not been trained
with plant-specific data and the accuracy for
predicting plant LSPs could not be evaluated, we did
not include this tool in our data processing.
The PlantSecKB strives to serve as a portal for plant
researchers to search plant protein subcellular
locations with an emphasis on secreted proteins. The
EST sub-database is expected to facilitate EST data
mining for secreted proteins from expressed data,
which is particularly useful for plant species not
completely sequenced or having only a limited
number of cDNA sequences. The collection and
curation of secreted plant proteins, particularly LSPs,
from literature with experimental evidence requires
continuous efforts from the plant research community.
We have implemented a curation tool accessible
through PlantSecKB for the community to manually
curate subcellular locations of plant proteins having
experimental evidence. The utility described in
PlantSecKB, together with our recently implemented
Fungal Secretome KnowledgeBase (FunSecKB) (Lum
and Min, 2011b), is anticipated to provide a search,
download, and curation system that will help the plant
community to further understand secretome biology. It
can also be used to explore various potential
applications and their interactions of plant and fungal
secreted proteins for plant pathogen control and
breeding for stress resistant varieties (Kim et al.,
2009).
Authors' contributions
GL and JM implemented the database, JO and SF
manually curated secreted proteins, XJM conceived of
the study, designed the procedure of data processing.
Computational
Molecular Biology