6 - CMB-2014v4n7页

基本HTML版本

Computational Molecular Biology 2014, Vol. 4, No. 9, 1-6
http://cmb.biopublisher.ca
2
1 Methods
1.1 Sequence Retrieval
This study is focus on the de novo assembly and
sequence annotation of five legume species which are
Arachis hypogaea
L.
(The peanut) of
SRR1212866,
Cicer arietinum
L. of
SRR627764
,
Phaseolus vulgaris
L. of
SRR1283084
,
Trigonella foenum-graecum
L. of
SRR066197
and
Vicia sativa
L. of
SRR403901
from
NCBI database for de novo Transcriptome analysis.
Raw data downloaded from NCBI SRA (http://trace.
ncbi.nlm.nih.gov/Traces/sra/) which are from Illumina
HiSeq 2000 platform and LS454 platform- 454 GS
FLX. Raw sequence was converted into fastq file
format for further annotation with the use of SRA
TOOL KIT from NCBI (http://trace.ncbi.nlm.nih.
gov/Traces/sra/sra.cgi?view=software).
1.2 NGS QC Toolkit
NGS QC Toolkit, it is an application for quality check
and filtering of high-quality data. This toolkit is a
standalone and open source application freely
available at http://www.nipgr.res.in/ngsqctoolkit.html.
The toolkit is comprised of user-friendly tools for QC
of sequencing data generated using Roche 454 and
Illumina platforms, and additional tools to aid QC
(sequence format converter and trimming tools) and
analysis (statistics tools). A variety of options have
been provided to facilitate the QC at user-defined
parameters. The toolkit is expected to be very useful
for the QC of NGS data to facilitate better
downstream analysis (Patel RK, et al).
1.3 De novo sequence assembly by CLC
GENOMICS WORKBENCH 7
A comprehensive and user-friendly analysis package
for analyzing, comparing, and visualizing next
generation sequencing data. This package was used
for de novo sequence assembly of sequence with by
default parameters of de novo assembly tool
(http://www.clcbio.com/products/clc-genomics-workb
ench/).
1.4 BLASTX
The assembled file was further considered for
annotation in which first step was to identify
translated protein sequences from contigs. BLASTX
at NCBI (http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?
PROGRAM=blastx&PAGE_TYPE=BlastSearch&LI
NK_LOC=blasthome) performed with changing few
parameters like non redundant protein database (nr)
selected as Database;
Eudicots
selected in organism
option and in Algorithm parameters Max target
Sequences set to 10 and Expect threshold set to 6.
1.5 Blast2GO
Blast2GO is an ALL in ONE tool for functional
annotation of (novel) sequences and the analysis of
annotation data (http://www.blast2go.com/b2ghome).
Based on the results of the protein database annotation,
Blast2GO was employed to obtain the functional
classification of the unigenes based on GO terms. The
transcript contigs were classified under three GO
terms such as molecular function, cellular process and
biological process (Ness et al., 2011; Shi et al., 2011;
Wang et al., 2010). WEGO (http://www.wego.
genomics.org.cn) tool was used to perform the GO
functional classification for all of the unigenes and to
understand the distribution of the gene functions of
this species at the macro level. The KEGG database
(http://www.genome.jp/kegg/pathway.html) was used
to annotate the pathway of these unigenes.
1.6 SSR mining
We employed MIcroSAtellite (MISA) (http://pgrc.ipk-
gatersleben.de/misa/) for microsatellite mining which
gives various statistical outputs of transcripts with
useful information.
1.7 Plant transcription factor
PlantTFcat: An Online Plant Transcription Factor and
Transcriptional Regulator Categorization and Analysis
Tool used for identifying plant transcription factor in
sequences (http://plantgrn.noble.org/PlantTFcat/).
2 Result and Discussions:
2.1 Sequence Comparison
(Tbale 1).
2.2 NGS QC Toolkit
Sequence was filtered with this tool by removing
adaptors and other contaminated materials then quality
of sequence also checked with this tool and finally
high quality filter sequence file considered for de novo
sequence assembly (Table 2).