Page 14 - ME-436-v3-3

PlantSecKB: the Plant Secretome and Subcellular Proteome KnowledgeBase

11

or by us were considered first in assigning a

subcellular location, these assignments are based on

traceable literature with experimental evidence, and

thus fairly reliable. However, the subcellular locations

assigned based on the computational prediction will

depend on the accuracy of the tools used. We have

evaluated the prediction accuracy of the methods we

used in this study and compared it with the accuracies

of other methods (Table 1) (Min, 2010; Meinken and

Min, 2012). We concluded the prediction of secreted

proteins is relatively reliable. However, false positives

and false negatives certainly exist. For example, a

number of P450 enzymes were predicted to be secreted

proteins, which are most likely false positives.

We also predicted other subcellular locations

including mitochondrial, chloroplast, vacuole, nucleus,

and others based on the predictions of TargetP and

WoLF PSORT. Our evaluation on the prediction

accuracies of these subcellular locations revealed that

the accuracies of the tools we used, even though they

are best among available tools, are still not

satisfactory due to relatively low prediction

sensitivities for these subcellular locations (Table 1)

(Meinken and Min, 2013). With the exception of

mitochondrial and cytosol proteins, however, the

specificities for those subcellular locations including

chloroplast, ER, Golgi apparatus, nucleus, plasma

membrane, vacuole and cytoskeleton are acceptable

(>89%). Thus, proteins predicted in those subcellular

locations are relatively reliable, though they still need

to be cautiously examined with experiments.

Recently, several new tools were developed

including the Cell-PLoc servers (Chou and Shen,

2008), MultiLoc2 (Blum et al., 2009), and others

(Meinken and Min, 2012). These tools and their

related publications can be found at our website

(http://proteomics.ysu.edu/tools/subcell.html) (Meinken

and Min, 2012). As standalone tools are not available

for some of them, such as Cell-PLoc, or some

standalone tools are too slow for processing a large

data set, such as MutliLoc2, we were not able to use

them for our data processing. However, we suggest

users utilize these tools to get a second prediction for

proteins of interest as our experience showed that

using multiple tools improves prediction specificity.

Based on several recent large-scale secretome studies

in plants, non-classical, i.e. leadless secretory proteins

(LSPs) were observed to account for more than 50%

of the total identified secretome, supporting the

existence of novel secretory mechanisms independent

of the classical ER-Golgi secretory pathway

(Agrawal et al., 2010 for review; Jung et al., 2008;

Cheng and Williamson, 2010; Ding et al., 2012).

Mammalian and bacterial LSPs have been

collected and used to implement the prediction

software, SecretomeP, for predicting these proteins

(http://www.cbs.dtu.dk/services/SecretomeP/) (Bendtsen

et al., 2004a). Because the tool has not been trained

with plant-specific data and the accuracy for

predicting plant LSPs could not be evaluated, we did

not include this tool in our data processing.

The PlantSecKB strives to serve as a portal for plant

researchers to search plant protein subcellular

locations with an emphasis on secreted proteins. The

EST sub-database is expected to facilitate EST data

mining for secreted proteins from expressed data,

which is particularly useful for plant species not

completely sequenced or having only a limited

number of cDNA sequences. The collection and

curation of secreted plant proteins, particularly LSPs,

from literature with experimental evidence requires

continuous efforts from the plant research community.

We have implemented a curation tool accessible

through PlantSecKB for the community to manually

curate subcellular locations of plant proteins having

experimental evidence. The utility described in

PlantSecKB, together with our recently implemented

Fungal Secretome KnowledgeBase (FunSecKB) (Lum

and Min, 2011b), is anticipated to provide a search,

download, and curation system that will help the plant

community to further understand secretome biology. It

can also be used to explore various potential

applications and their interactions of plant and fungal

secreted proteins for plant pathogen control and

breeding for stress resistant varieties (Kim et al.,

2009).

Authors' contributions

GL and JM implemented the database, JO and SF

manually curated secreted proteins, XJM conceived of

the study, designed the procedure of data processing.

Computational

Molecular Biology