PlantSecKB: the Plant Secretome and Subcellular Proteome KnowledgeBase
5
including SignalP 4.0, TargetP, and Phobius for
secretory signal peptide prediction and PS-Scan for
removing ER proteins and TMHMM for removing
membrane proteins - significantly improved the
prediction accuracy for secretomes (Min, 2010;
Meinken and Min, 2012). For secretome prediction
our method had reached a sensitivity of 91.1%, a
specificity of 98.7%, and a Mathews’ correlation
coefficient (MCC) of 88.5% for dataset A; and a
sensitivity of 76.8%, a specificity of 98.9%, and a
MCC of 74.5% for dataset B, which were much better
than using WoLF PSORT or MultiLoc alone (Meinken
and Min, 2012). Thus the prediction of secreted
proteins is relatively reliable. The accuracies for
predicting other subcellular locations still need to be
improved.
Table 1 Evaluation of prediction accuracies of plant protein subcellular locations
Subcellular location
Dataset A (total 15028)
Dataset B (total 6908)
Total
Total
Sn
Sp MCC Total
Total
Sn
Sp MCC
positives negatives
(%)
(%)
(%)
positives negatives (%)
(%)
(%)
Secreted
1485
13543 91.1 98.7 88.5
263
6645 76.8 98.9 74.5
Mitochondrial
919
14109 65.2 82.6 28.4
402
6506 61.4 77.5 21.1
Chloroplast
8124
6904
27.5 90.9 23.5 4918
1990 28.2 90.7 20.4
ER
256
14772 22.3 100.0 46.0
87
6821 18.4 100.0 42.7
Cytosol
77
14951 61.0 78.9 7.0
23
6885 52.2 75.3
3.7
Golgi Apparatus
260
14768
1.5
99.9 6.3
54
6854
0.0 100.0 -0.2
Peroxisome
136
14892 24.3 99.7 31.6
52
6856 13.5 99.5 15.0
Nucleus
3099
11929
62.2 89.2 50.7
788
6120 68.8 85.5 42.7
Plasma Membrane
91
14937 35.2 95.1 10.7
14
6894 21.4 98.9
8.5
Vacuole
273
14755
5.1
99.0 5.5
121
6787
2.5 99.8
6.8
Cytoskeleton
305
14723 13.8 99.7 24.3
186
6722 21.0 99.7 36.0
Note: Sn: sensitivity; Sp: specificity; MCC: Mathews' correlation coefficient
1.4 Manual curation and community annotation
PlantSecKB supports community curation of
subcellular locations of plant proteins based on
published experimental evidence. A submission tool
was developed for the community to provide
subcellular location annotation of a protein and a
literature source to support its annotation. After our
curator’s validation, these data are also incorporated
into the database. Currently, based on published
experimental evidence, we have manually curated 736
total secreted proteins from rice (Jung et al., 2008;
Cho et al., 2009; Cho and Kim, 2009; Chen et la.,
2009; Zhang et al., 2009; Shinano et al., 2011),
Arabidopsis (De-la-Pena et al., 2010), and sorghum
(Ngara et al., 2011). Manual curation is an ongoing
process, thus more secreted proteins will be manually
curated and integrated into the database in the future
from the community and our curators. The information
from computational prediction, UniProtKB annotation
and manual curation is integrated and displayed on
the annotation page (Figure 1). The annotated
entries are linked to the tools used, UniProtKB,
the RefSeq database and PubMed in the National
Center for Biotechnology Information (NCBI)
(http://www.ncbi.nlm.nih.gov/).
2 Overview of the Database Content and
Tools
2.1 Data and tool access
The PlantSecKB is accessed through the database web
interface at http://proteomics.ysu.edu/secretomes/plant.php.
The interface provides various utilities for searching
proteins obtained from UnitProtKB, links to BLAST,
an EST data search page, and the community
annotation page (Figure 1). All plant proteins obtained
from UniProt can be searched using UniProt accession
number (AC) or ID, gene name, key word(s) in
protein function or species. Sub-proteomes including
curated secreted proteins, complete secretome,
Computational
Molecular Biology