5 - CMB 1265-2014v4n6页

Computational Molecular Biology 2014, Vol. 4, No. 7, 1-17

http://cmb.biopublisher.ca

2

pathogenic or symbiotic interactions between plants

and fungi (Girard et al., 2013). Saprophytic fungi

secrete a large number of families of hydrolytic

enzymes such as glycoside hydrolases for breaking

down complex biomaterials like lignin and cellulose

(Martinez et al., 2004; Martinez et al., 2009; Murphy

et al., 2011). Recently, along with complete genome

sequencing of many fungi, identification and analysis

of secretomes in fungi has been an important subject

of research, using both computational and experimental

approaches (Bouws et al., 2008). For example, the

secretomes have been reported in following fungi

including

Aspergillus niger

(Tsang et al., 2009;

Braaksma et al., 2010),

Aspergillus fumigatus

(Powers-Fletcher et al., 2011),

Candida albicans

(Lee

et al., 2003; Ene et al., 2012),

Doratomyces stemonitis

C8 (Peterson et al., 2011),

Fusarium graminearum

(Paper et al., 2007; Brown et al., 2012),

Irpex lacteus

(Salvachúa et al., 2013),

Magnaporthe oryzae

(Jung et

al., 2012),

Mycosphaerella graminicola

(Morais et al.,

2012), Paracoccidioides (a complex of several

phylogenetic species) (Weber et al., 2012),

Penicillium

echinulatum

(Ribeiro et al., 2012),

Phanerochaete

chrysosporium

(Wymelenberg et al., 2005),

Sclerotinia

sclerotiorum

(Yajima and Kav, 2006),

Trichoderma

harzianum

(Do Vale et al., 2012), and

Ustilago maydis

(Mueller et al., 2008).

Two fungal specific secretome databases, the Fungal

Secretome Database (FSD,

http://fsd.snu.ac.kr/)

and

the Fungal Secretome Knowledgebase (FunSecKB,

http://proteomics.ysu.edu/secretomes/fungi.php)

have

been constructed for the community to search fungal

secretome related information (Choi et al., 2010; Lum

and Min, 2011). FSD was constructed using a

three-layer hierarchical identification rule based on 9

different programs (Choi et al., 2010). We developed

the FunSecKB using 6 different tools for predicting

secreted proteins from RefSeq data set of fungi (Lum

and Min, 2011). However, since the release of

FunSecKB, the available fungal protein data have

been increased tremendously. In this work, we

describe FunSecKB2, a fungal protein subcellular

location knowledgebase, also known as the Fungal

Secretome and Subcellular Proteome Knowledgebase

(Version 2), that is, an expanded, updated, and

improved version of FunSecKB. FunSecKB2 is

constructed with a refined protocol for including

curated subcellular information and predicted

information on secretomes and other subcellular

proteomes of 15 subcellular locations. This improved

fungal protein knowledgebase is expected to serve as a

central portal for providing information on fungal

protein subcellular locations to users in the fungal

research and industrial community who are interested

in exploiting fungi for a global development of the

bioeconomy (Lange et al., 2012).

1 Data Collection and Database Implementation

1.1 Data collection

The protein sequences for all fungi were retrieved

from the UniProtKB/Swiss-Prot dataset and the

UniProtKB/TrEMBL dataset (release 2013_08)

(http://www.uniprot.org/downloads). The UniProtKB/

Swiss-Prot dataset contains manually annotated

non-redundant protein sequences with information

extracted from literature of experimental results and

curator-evaluated computational analysis (The

UniProt Consortium, 2014). The UniProtKB/TrEMBL

contains protein sequences associated with

computationally generated annotation and large-scale

functional characterization. The dataset consisted of a

total of 1,976,832 fungal proteins with 30,859 and

1,945,973 entries retrieved from the UniProtKB/

Swiss-Prot dataset and the TrEMBL dataset, respectively.

1.2 Methods for protein subcellular location

assignment

The fungal protein sequences were processed using

the following programs: SignalP (version 3.0 and 4.0,

http://www.cbs.dtu.dk/services/SignalP/)

, (Bendtsen et

al., 2004b; Petersen et al., 2011), Phobius

(http://phobius.binf.ku.dk/)

(Käll et al., 2007), WoLF

PSORT

(http://wolfpsort.org/)

(Horton et al., 2007),

and TargetP

(http://www.cbs.dtu.dk/services/TargetP/)

(Emanuelsson et al., 2007) for signal peptide and

subcellular location prediction. These predictors were

previously evaluated favorably and are widely used by

the fungal secretome research community (Min, 2010).

TMHMM (http://www.cbs.dtu.dk/services/TMHMM)

was used to identify proteins having transmembrane

domains (Krogh et al., 2001) and Scan-Prosite (called

PS-Scan in standalone version) (http://www.expasy.