Computational Molecular Biology 2016, Vol.6, No.4, 1-12
9
Note: Abbreviation: HLS: highly likely secreted; LS: likely secreted; Cyt: cytoplasm (or cytosol); Plasm: plasma membrane; Mt mem:
mitochondrial membrane; Mt non-m: mitochondrial non-membrane; Nuc mem: nuclear membrane; Nuc non-m: nuclear
non-membrane; Sec: secretome
For example,
D. discoideum
had 52 secreted proteins with DUF3430 domain (unknown function) and 44 secreted
proteins with carbohydrate binding domain CBM49, while the other two species had no such protein family at all.
As expected, there were a large number of secreted Elicitin, RXLR phytopathogen effector protein, necrosis
inducing protein (NPP1), phytotoxin PcF protein, trypsin, etc. in
P. infestans
, which may be related to its lifestyle
as a plant pathogen (Meijer et al., 2014).
T. cruzi,
not surprisingly as a human parasite pathogen, had 345
Mucin-like glycoprotein, 198 BNR repeat-like domain, and 102 Peptidase_M8 (Leishmanolysin), etc. in its
secretome while the other two species did not have any for those categories. These secreted proteins may play an
important role for
T. cruzi
for invading and infecting humans and causing Chagas' disease (Costa et al., 2016).
4 Discussion
We constructed the ProtSecKB to provide a resource of curated and predicted subcellular locations of protist
proteins. As all the tools we selected to use were not specifically trained for protists, the prediction accuracies
were lower than prediction accuracies in other eukaryotes including fungi, plants and animals (Lum and Min,
2011; Lum et al., 2014; Meiken et al., 2014; Meiken et al., 2015). However, our evaluation using curated protein
subcellular locations showed that the prediction specificities for nearly all subcellular locations except nucleus
were > 90%, and in particular, prediction of secreted proteins had an MCC value of 0.71 with 89.0% sensitivity
and 96.2% specificity (Table 1). Thus we concluded that the prediction of secreted proteins was relatively reliable.
Other tools are also available as webservers including the Cell-PLoc servers (Chou and Shen, 2008) and some
others (Meinken and Min, 2012). These tools and their related publications can be found at our website
(Meinken and Min, 2012). As standalone tools are not available
for some, such as Cell-PLoc, or too slow to processing large datasets, we were not able to use them for our data
processing. However, we suggest users utilize these tools to get a second prediction for proteins of interest as our
experience showed that using multiple tools improves prediction specificity.
Recently the efforts had been made by our research group to improve the prediction accuracies of subcellular
locations in plant proteins (Neizer-Ashun et al., 2015), fungal proteins (Munyon et al., 2015), and animal/human
proteins (Khavari, 2016) using various statistics algorithms. The results were mixed for different subcellular
locations using different methods with different eukaryotic proteins. However, some of the algorithms were
promising in improving the prediction accuracy. When enough experimental protist protein subcellular location
data are available, a specific tool will need to be implemented for protist protein subcellular location prediction.
ProtSecKB contains 101 unique protist species within some of them having multiple strains resulting in a total of
127 organisms having complete proteomes. The database allows that each subcellular proteome in each species
can be searched and downloaded for detailed comparative analysis. As an example for the usage of the database,
our analysis on protein families using three species having different lifestyles demonstrated that the secretome in
each species may play an important role in determining their lifestyles (Table 3). We also have implemented a
curation tool accessible through ProtSecKB for the community to manually curate subcellular locations of protist
proteins having experimental evidence. We anticipate the database resource will facilitate the protist research
community to design further experiments characterizing protist proteins and understanding protist biology,
particularly of the plant, human and animal protist pathogens.
Authors' contributions
XM and CC conceived the work; BP, VA and JM implemented the database; GK curated proteins. XM, BP, FY
analyzed the data. XM, BP, JM and CC prepared the manuscript. All authors read and approved the final
manuscript.