Computational Molecular Biology 2014, Vol. 4, No. 7, 1-17
http://cmb.biopublisher.ca
2
pathogenic or symbiotic interactions between plants
and fungi (Girard et al., 2013). Saprophytic fungi
secrete a large number of families of hydrolytic
enzymes such as glycoside hydrolases for breaking
down complex biomaterials like lignin and cellulose
(Martinez et al., 2004; Martinez et al., 2009; Murphy
et al., 2011). Recently, along with complete genome
sequencing of many fungi, identification and analysis
of secretomes in fungi has been an important subject
of research, using both computational and experimental
approaches (Bouws et al., 2008). For example, the
secretomes have been reported in following fungi
including
Aspergillus niger
(Tsang et al., 2009;
Braaksma et al., 2010),
Aspergillus fumigatus
(Powers-Fletcher et al., 2011),
Candida albicans
(Lee
et al., 2003; Ene et al., 2012),
Doratomyces stemonitis
C8 (Peterson et al., 2011),
Fusarium graminearum
(Paper et al., 2007; Brown et al., 2012),
Irpex lacteus
(Salvachúa et al., 2013),
Magnaporthe oryzae
(Jung et
al., 2012),
Mycosphaerella graminicola
(Morais et al.,
2012), Paracoccidioides (a complex of several
phylogenetic species) (Weber et al., 2012),
Penicillium
echinulatum
(Ribeiro et al., 2012),
Phanerochaete
chrysosporium
(Wymelenberg et al., 2005),
Sclerotinia
sclerotiorum
(Yajima and Kav, 2006),
Trichoderma
harzianum
(Do Vale et al., 2012), and
Ustilago maydis
(Mueller et al., 2008).
Two fungal specific secretome databases, the Fungal
Secretome Database (FSD,
and
the Fungal Secretome Knowledgebase (FunSecKB,
have
been constructed for the community to search fungal
secretome related information (Choi et al., 2010; Lum
and Min, 2011). FSD was constructed using a
three-layer hierarchical identification rule based on 9
different programs (Choi et al., 2010). We developed
the FunSecKB using 6 different tools for predicting
secreted proteins from RefSeq data set of fungi (Lum
and Min, 2011). However, since the release of
FunSecKB, the available fungal protein data have
been increased tremendously. In this work, we
describe FunSecKB2, a fungal protein subcellular
location knowledgebase, also known as the Fungal
Secretome and Subcellular Proteome Knowledgebase
(Version 2), that is, an expanded, updated, and
improved version of FunSecKB. FunSecKB2 is
constructed with a refined protocol for including
curated subcellular information and predicted
information on secretomes and other subcellular
proteomes of 15 subcellular locations. This improved
fungal protein knowledgebase is expected to serve as a
central portal for providing information on fungal
protein subcellular locations to users in the fungal
research and industrial community who are interested
in exploiting fungi for a global development of the
bioeconomy (Lange et al., 2012).
1 Data Collection and Database Implementation
1.1 Data collection
The protein sequences for all fungi were retrieved
from the UniProtKB/Swiss-Prot dataset and the
UniProtKB/TrEMBL dataset (release 2013_08)
(http://www.uniprot.org/downloads). The UniProtKB/
Swiss-Prot dataset contains manually annotated
non-redundant protein sequences with information
extracted from literature of experimental results and
curator-evaluated computational analysis (The
UniProt Consortium, 2014). The UniProtKB/TrEMBL
contains protein sequences associated with
computationally generated annotation and large-scale
functional characterization. The dataset consisted of a
total of 1,976,832 fungal proteins with 30,859 and
1,945,973 entries retrieved from the UniProtKB/
Swiss-Prot dataset and the TrEMBL dataset, respectively.
1.2 Methods for protein subcellular location
assignment
The fungal protein sequences were processed using
the following programs: SignalP (version 3.0 and 4.0,
, (Bendtsen et
al., 2004b; Petersen et al., 2011), Phobius
(Käll et al., 2007), WoLF
PSORT
(Horton et al., 2007),
and TargetP
(Emanuelsson et al., 2007) for signal peptide and
subcellular location prediction. These predictors were
previously evaluated favorably and are widely used by
the fungal secretome research community (Min, 2010).
TMHMM (http://www.cbs.dtu.dk/services/TMHMM)
was used to identify proteins having transmembrane
domains (Krogh et al., 2001) and Scan-Prosite (called
PS-Scan in standalone version) (http://www.expasy.