Page 9 - ME-436-v3-3

Basic HTML Version

Computational Molecular Biology
6
mitochondrial membrane proteins, ER proteins, and
others can be searched or downloaded by selecting
species from a species list for those having greater
than 1 000 protein sequences. Species having fewer
than 1 000 protein entries can be searched by inputting
a species name. The BLAST utility can be accessed
through a link on the interface for searching all plant
proteins or secretomes. The interface also provides a link
to an EST data search page. EST data can be searched
using EST identifier, keyword(s), species or BLAST.
Figure 1 Overview of the PlantSecKB user interface and annotation page. (A) User interface. UniProt accession number, keywords or
species can be used to search the database. Secretomes or other subcellular proteomes can be searched or downloaded. The user
interface provides links to BLAST utility, EST database, and the curation submission form. (B) A page to display information of
subcellular annotation, prediction, and sequence of a protein. A barley alpha-amylase is used as an example
The annotation display page for each UniProt protein
contains information obtained from the following
three sources: (1) the features predicted using
computational approaches using the seven programs
mentioned above; (2) subcellular locations annotated
in UniProtKB; and (3) our manual curation with
experimental evidence obtained from recent literature.
The overview of the database features is shown in
Figure 1. Manually curated secreted proteins consist
of proteins retrieved from UniProtKB/Swiss-Prot with
subcellular locations labeled as “reviewed”, as well as
proteins curated by our curators. The curated proteins
from internal curation and the community are
supported with experimental evidence for their
subcellular location annotation and related literature.
The annotation page also contains the primary protein
sequence (Figure 1).
EST data annotation contains the primary EST
sequence, predicted protein peptide sequence using
OrfPredictor (Min et al., 2005a), functional annotation
based on BLASTX, prediction of completeness of the
open reading frame using TargetIdentifier (Min et al.,
2005b), and related information generated with the
tools for subcellular location prediction based on
predicted protein sequences. As EST data may contain
errors introduced in sequencing and assembling,
caution needs to be taken when using the data.
Nevertheless, EST information provided in the
database will be useful for data mining and designing
experiments for further examining the gene function
and subcellular locations of encoded proteins.
2.2 Data summary
PlantSecKB contains a total of 1 415 921 protein
sequences including 33 643 entries from the
Computational
Molecular Biology