PlantSecKB: the Plant Secretome and Subcellular Proteome KnowledgeBase
            
            
              5
            
            
              including SignalP 4.0, TargetP, and Phobius for
            
            
              secretory signal peptide prediction and PS-Scan for
            
            
              removing ER proteins and TMHMM for removing
            
            
              membrane proteins - significantly improved the
            
            
              prediction accuracy for secretomes (Min, 2010;
            
            
              Meinken and Min, 2012). For secretome prediction
            
            
              our method had reached a sensitivity of 91.1%, a
            
            
              specificity of 98.7%, and a Mathews’ correlation
            
            
              coefficient (MCC) of 88.5% for dataset A; and a
            
            
              sensitivity of 76.8%, a specificity of 98.9%, and a
            
            
              MCC of 74.5% for dataset B, which were much better
            
            
              than using WoLF PSORT or MultiLoc alone (Meinken
            
            
              and Min, 2012). Thus the prediction of secreted
            
            
              proteins is relatively reliable. The accuracies for
            
            
              predicting other subcellular locations still need to be
            
            
              improved.
            
            
              Table 1 Evaluation of prediction accuracies of plant protein subcellular locations
            
            
              Subcellular location
            
            
              Dataset A (total 15028)
            
            
              Dataset B (total 6908)
            
            
              Total
            
            
              Total
            
            
              Sn
            
            
              Sp MCC Total
            
            
              Total
            
            
              Sn
            
            
              Sp MCC
            
            
              positives negatives
            
            
              (%)
            
            
              (%)
            
            
              (%)
            
            
              positives negatives (%)
            
            
              (%)
            
            
              (%)
            
            
              Secreted
            
            
              1485
            
            
              13543 91.1 98.7 88.5
            
            
              263
            
            
              6645 76.8 98.9 74.5
            
            
              Mitochondrial
            
            
              919
            
            
              14109 65.2 82.6 28.4
            
            
              402
            
            
              6506 61.4 77.5 21.1
            
            
              Chloroplast
            
            
              8124
            
            
              6904
            
            
              27.5 90.9 23.5 4918
            
            
              1990 28.2 90.7 20.4
            
            
              ER
            
            
              256
            
            
              14772 22.3 100.0 46.0
            
            
              87
            
            
              6821 18.4 100.0 42.7
            
            
              Cytosol
            
            
              77
            
            
              14951 61.0 78.9 7.0
            
            
              23
            
            
              6885 52.2 75.3
            
            
              3.7
            
            
              Golgi Apparatus
            
            
              260
            
            
              14768
            
            
              1.5
            
            
              99.9 6.3
            
            
              54
            
            
              6854
            
            
              0.0 100.0 -0.2
            
            
              Peroxisome
            
            
              136
            
            
              14892 24.3 99.7 31.6
            
            
              52
            
            
              6856 13.5 99.5 15.0
            
            
              Nucleus
            
            
              3099
            
            
              11929
            
            
              62.2 89.2 50.7
            
            
              788
            
            
              6120 68.8 85.5 42.7
            
            
              Plasma Membrane
            
            
              91
            
            
              14937 35.2 95.1 10.7
            
            
              14
            
            
              6894 21.4 98.9
            
            
              8.5
            
            
              Vacuole
            
            
              273
            
            
              14755
            
            
              5.1
            
            
              99.0 5.5
            
            
              121
            
            
              6787
            
            
              2.5 99.8
            
            
              6.8
            
            
              Cytoskeleton
            
            
              305
            
            
              14723 13.8 99.7 24.3
            
            
              186
            
            
              6722 21.0 99.7 36.0
            
            
              Note: Sn: sensitivity; Sp: specificity; MCC: Mathews' correlation coefficient
            
            
              
                1.4 Manual curation and community annotation
              
            
            
              PlantSecKB supports community curation of
            
            
              subcellular locations of plant proteins based on
            
            
              published experimental evidence. A submission tool
            
            
              was developed for the community to provide
            
            
              subcellular location annotation of a protein and a
            
            
              literature source to support its annotation. After our
            
            
              curator’s validation, these data are also incorporated
            
            
              into the database. Currently, based on published
            
            
              experimental evidence, we have manually curated 736
            
            
              total secreted proteins from rice (Jung et al., 2008;
            
            
              Cho et al., 2009; Cho and Kim, 2009; Chen et la.,
            
            
              2009; Zhang et al., 2009; Shinano et al., 2011),
            
            
              Arabidopsis (De-la-Pena et al., 2010), and sorghum
            
            
              (Ngara et al., 2011). Manual curation is an ongoing
            
            
              process, thus more secreted proteins will be manually
            
            
              curated and integrated into the database in the future
            
            
              from the community and our curators. The information
            
            
              from computational prediction, UniProtKB annotation
            
            
              and manual curation is integrated and displayed on
            
            
              the annotation page (Figure 1). The annotated
            
            
              entries are linked to the tools used, UniProtKB,
            
            
              the RefSeq database and PubMed in the National
            
            
              Center for Biotechnology Information (NCBI)
            
            
              (http://www.ncbi.nlm.nih.gov/).
            
            
              
                2 Overview of the Database Content and
              
            
            
              
                Tools
              
            
            
              
                2.1 Data and tool access
              
            
            
              The PlantSecKB is accessed through the database web
            
            
              interface at http://proteomics.ysu.edu/secretomes/plant.php.
            
            
              The interface provides various utilities for searching
            
            
              proteins obtained from UnitProtKB, links to BLAST,
            
            
              an EST data search page, and the community
            
            
              annotation page (Figure 1). All plant proteins obtained
            
            
              from UniProt can be searched using UniProt accession
            
            
              number (AC) or ID, gene name, key word(s) in
            
            
              protein function or species. Sub-proteomes including
            
            
              curated secreted proteins, complete secretome,
            
            
              Computational
            
            
              Molecular Biology