Computational Molecular Biology 2014, Vol. 4, No. 7, 1-17
http://cmb.biopublisher.ca
5
We also compared the accuracy of mitochondrial
proteins predicted by WoLF PSORT and TargetP. We
found that the MCC values were 0.67 for WoLF
PSORT and 0.56 for TargetP, and we also found using
both tools increased the mitochondrial protein
prediction specificity, from 0.93 using WoLF PSORT
only to >0.98 when both were used. However, using
both tools did not improve the MCC value due to the
decrease in prediction sensitivity. Thus, we selected
WoLF PSORT for assigning mitochondrial proteins.
However, a user should be aware that if both WoLF
PSORT and TargetP predicted the protein is a
mitochondrial protein, the prediction is more reliable
than prediction just from one of them.
The prediction accuracies for other subcellular
locations vary significantly. Prediction of nuclear
proteins had 0.85 in sensitivity, 0.71 in specificity, and
0.53 in MCC. The accuracies for other subcellular
locations including the ER, Golgi apparatus, vacuole,
peroxisome, cytoplasm, and cytoskeleton were very
low in MCC (<0.4) (Table 1). However, it should be
noted that the low accuracies were caused by very low
sensitivities, and in fact, the specificities were
relatively high (>0.98). Thus, there are a good number
of proteins located in these subcellular locations that
cannot be predicted. However, if a protein is predicted
to be located in such a location, the prediction is most
likely correct. Nonetheless, the accuracies for
predicting these subcellular locations of fungal
proteins need to be improved.
2.2 Overview of subcellular proteome distribution
in different species
The database contains predicted subcellular location
information of proteins generated from 16554 fungal
species or varieties (strains) with 189 of them each
having at least 1000 protein entries. The species
names, some of which may have more than one strain
or variety, can be found on the user interface, which
facilitate species specific searching or downloading.
Species having <1000 protein entries can also
searched with a species name provided by the user.
The distributions of subcellular proteomes in different
fungal species are summarized in Table 2 and
Table 2 includes the following
subcellular locations: secreted proteins (4 subcategories),
mitochondrial membrane and mitochondrial
non-membrane, cytoplasm (cytosol), cytoskeleton,
nuclear membrane and nuclear non-membrane, plasma
membrane, and glycosylphosphatidylinositol (GPI)
anchored proteins. The category of secreted proteins
includes the following subcategories: curated secreted,
highly likely secreted, likely secreted, and weakly
secreted proteins. Information on other subcellular
protein locations including endoplasmic reticulum
(membrane or lumen), Golgi apparatus (membrane or
lumen), lysosome, peroxisome, vacuole (membrane or
non-membrane), other membrane, and other curated
locations can be found i
The variability of genome sizes and thus the proteome
sizes is pretty large in different fungal species.
However, it should be noted that in the database, as
showed in Table 2, the total proteins of a given species
is not necessarily the proteome size, but rather a
collection of all proteins available from the species.
For example, for
Saccharomyces cerevisiae
, its
reference proteome size as compiled UniProtKB
consists only of 6,621 proteins, there are a total of
79,093 proteins in our database under the name of
Saccharomyces cerevisiae
, thus obviously consisting
of proteins obtained from multiple strains. The
subcellular distributions of fungal proteins were
estimated based on the pooled data for each phylum
for Ascomycota, Basidiomycota and Microsporidia.
Interestingly, we found that the nucleus represents the
largest compartment for protein destination: 39.2% in
Ascomycota, 39.2% in Basidiomycota, and 57.4% in
Microsporidia, respectively, were predicted to be
located in the nucleus. Mitochondria represent another
large compartment for protein targeting: 19.5% in
Ascomycota, 21.1% in Basidiomycota, and 16.7% in
Microsporidia, respectively, were located in
mitochondria. Approximately 18 – 21% of proteins
are located in cytosol or cytoplasm. The proportions
of secretomes vary from 0.3% to 10.5% with an
average of 4.6% in Ascomycota, from 1.9% to 7.4%
with an average of 4.4% in Basidiomycota, and from
0.5% to 1.7% with an average of 0.9% in
Microsporidia, respectively. However, here the
secretome is limited to including curated secreted
proteins and highly likely secreted proteins, thus the