Computational Molecular Biology 2014, Vol. 4, No. 7, 1-17
http://cmb.biopublisher.ca
3
org/tools/scanprosite/) was used to scan endoplasmic
reticulum (ER) targeting sequence (Prosite: PS00014)
(de Castro et al., 2006; Sigrist et al., 2010). For
predicting membrane proteins using TMHMM, the
entries having membrane domains not located within
the N-terminus (the first 70 amino acids) were treated
as real membrane proteins. Protein sequences
predicted to have a signal peptide by SignalP (version
3) were further processed using the FragAnchor
webserver to identify the glycosylphosphatidyinositol
(GPI) anchors (http://navet.ics.hawaii.edu/~fraganchor/
NNHMM/NNHMM.html) (Poisson et al., 2007). With
the exception of FragAnchor, we used the standalone
tools installed on a local Linux system for data
processing. The commands for how to run these tools
often can be found in the “readme” page in each
downloaded package and were summarized by Lum
and Min (2013).
The categories of fungal protein subcellular locations
include: secreted proteins, mitochondrial (membrane
or non-membrane), ER (membrane or lumen), cytosol
(cytoplasm), cytoskeleton, Golgi apparatus (membrane
or lumen), nuclear (membrane or non-membrane),
vacuolar (membrane or non-membrane), lysosome,
peroxisome, plasma membrane, and other membrane
proteins. For assigning a protein subcellular location,
the UniProtKB annotation and our curated subcellular
information was considered prior to using prediction
information. For proteins not having annotated
subcellular information, their subcellular location
assignments are based on prediction. Our recent
accuracy evaluation of the computational tools
revealed that the highest prediction accuracy (92.1%
in sensitivity and 98.9% in specificity) for fungal
secretomes was achieved by combining SignalP,
WoLF PSORT, and Phobius for signal peptide
prediction, with TMHMM for eliminating membrane
proteins and PS-Scan for removing ER targeting
proteins (Min, 2010). Thus, the secretome was limited
to include manually curated secreted proteins and
proteins predicted having a signal peptide at their
N-terminus by all the three programs but not having a
transmembrane domain or an ER targeting signal. In
this work, SignalP4 is used to replace SignalP3 as
SignalP4 improves the prediction accuracy (Petersen
et al., 2011; Melhem et al., 2013). However, the
information generated by SignalP3 was also included
as it predicts signal peptide cleavage sites more
accurately than SignalP4 (Petersen et al., 2011). The
detailed methods for assigning a protein subcellular
location are described below.
Secreted protein
Secreted proteins are further divided as curated
secreted proteins, highly likely secreted, likely
secreted, and weakly likely secreted proteins. Curated
secreted proteins include proteins that are annotated to
be “secreted” or “extracellular” or “cell wall” in
subcellular location from the UniProtKB/Swiss-Prot
data set which are “reviewed”. It also includes
manually collected secreted proteins from recent
literature by our curators. Three predictors consisting
of SignalP4, Phobius, and WoLF PSORT are used for
protein secretory signal peptide or subcellular location
prediction. The highly likely secreted, likely secreted,
and weakly likely secreted proteins are proteins that
are predicted to be secreted or contain a secretory
signal peptide by three, two, or one of the three
predictors, respectively. These proteins do not have a
transmembrane domain or an ER retention signal.
ER proteins
ER proteins were predicted by WoLF PSORT and
PS-Scan. Proteins predicted to contain a signal peptide
by SignalP 4.0 and an ER target signal (Prosite:
PS00014) by PS-Scan were treated as luminal ER
proteins. Further, if they contain one or more
transmembrane domains, they are classified as ER
membrane proteins.
GPI-anchored proteins
Signal peptide containing proteins that were predicted
to have a GPI anchor by FragAnchor were further
classified as GPI-anchored proteins. Protein sequences
predicted to have a signal peptide and a GPI anchor
may attach to the outer leaflet of the plasma
membrane or be secreted becoming components of the
cell wall.
Proteins in other subcellular locations
Other subcellular locations including mitochondria,
cytosol (cytoplasm), cytoskeleton, Golgi apparatus,