Page 6 - ME-436-v3-3

Basic HTML Version

PlantSecKB: the Plant Secretome and Subcellular Proteome KnowledgeBase
3
plant research community to access the available
information and deposit experimental evidence for
newly characterized proteins. In order to provide
such a central plant secretome related resource
portal, we developed the Plant Secretome and
Subcellular Proteome KnowledgeBase (PlantSecKB)
(http://proteomics.ysu.edu/secretomes/plant.html) ,
which includes predicted and manually curated protein
subcellular locations from plant proteomes as well as
predicted proteins from EST data in plants. Though
our focus is on plant secretomes, the information on
proteins located in other subcellular locations is also
provided. A tool for supporting community manual
curation of plant protein subcellular locations can be
accessed through the database interface.
1 Methods of Database Construction
1.1 Data collection
PlantSecKB was constructed primarily with the
sequence data obtained from two sources: plant
protein sequences extracted from UnitProtKB
(2013-04 Release) (http://www.uniprot.org/) and
protein sequences predicted from assembled EST
data compiled by the PlantGDB project
(http://www.plantgdb.org/prj/ESTCluster/). The proteins
predicted from the recently sequenced sacred lotus
(
Nelumbo nucifera Gaertn
.) genome were also
integrated into this database (Ming et al., 2013;
Lum et al., 2013). Protein sequences in the EST
data were predicted using the OrfPredictor tool
(http://proteomics.ysu.edu/tools/OrfPredictor.html)
with BLASTX input
against
the
UniProt/Swiss-Prot database, and TargetIdentifier
(http://proteomics.ysu.edu/tools/TargetIdentifier.html)
was used to examine if an EST was full-length (Min et
al., 2005a, 2005b).
1.2 Computational methods for prediction of
protein subcellular locations
The software tools used in this study include SignalP 3.0
and 4.0, TargetP, Phobius, WoLF PSORT, TMHMM,
PS-Scan, and FragAnchor. The website links for these
tools and related references can be found in our website
(http://proteomics.ysu.edu/tools/subcell.html). Except
FragAnchor, we used the standalone tools installed on
a local Linux system for data processing. The
commands for how to run them often can be found in
the “readme” page in each downloaded package and
were summarized by Lum and Min (2013). In brief,
SignalP 4.0 was used for secretory signal peptide
prediction (Petersen et al., 2011). However, we also
included prediction information from SignalP 3.0
(Bendtsen et al., 2004b) as it provides more accurate
cleavage site prediction than SignalP 4.0 (Petersen et
al., 2011). Phobius is a combined signal peptide and a
transmembrane topology predictor (Käll et al., 2007).
TargetP predicts the presence of any signal sequences
such as signal peptide (SP), chloroplast transit peptide
(cTP) or mitochondrial targeting peptide (mTP) in the
N-terminus (Emanuelsson et al., 2000; Emanuelsson
et al., 2007). TMHMM uses a hidden Markov model
(HMM) to predict the presence and topology of
transmembrane helices and their orientation to the
membrane (in/out) (Krogh et al., 2001). PS-Scan
was used to scan the PROSITE database
(http://www.expasy.org/tools/scanprosite/) for removing
ER targeting proteins (Prosite: PS00014) (de Castro et
al., 2006; Sigrist et al., 2010). FragAnchor was used to
identify the glycosylphosphatidyinositol
(GPI)
anchored proteins (GAP) from the proteins which
were predicted as containing a signal peptide by
SignalP 4.0 (Poisson et al., 2007). WoLF PSORT
predicts multiple subcellular locations including
choloroplast, cytosol, cytoskeleton, ER, extracellular
(secreted), Golgi apparatus, lysosome, mitochondria,
nuclear, peroxisome, plasma membrane, and vacuolar
membrane (Horton et al., 2007). The default
parameters for eukaryotes or plants, if available, were
used for all the programs. Our previous evaluation
found that including WoLF PSORT for plant
secretome prediction resulted in an accuracy decrease
due to a significant decrease in the prediction
sensitivity (Min, 2010). Thus, it was not used for
secretome prediction but only for prediction of some
other subcellular locations.
For the assignment of a subcellular location of a
protein, the UniProtKB annotated subcellular location
and our manual curation take precedence over
computational prediction. Thus, only proteins not
having an annotated subcellular location are subjected
to computational assignment of their subcellular
locations. The information produced by all the tools,
Computational
Molecular Biology