Genomics and Applied Biology 2014, Vol. 5, No. 5, 1-6
http://gab.biopublisher.ca
2
1 Methods
1.1 Sequence Retrieval
This study is focus on the de novo assembly and
sequence annotation of
Phaseolus vulgaris
L. of
SRR1283084
from NCBI database. Raw data
downloaded from NCBI SRA (http://trace.ncbi.nlm.
nih.gov/Traces/sra/?run= SRR1283084) which is from
Illumina HiSeq 2000 platform and the sample is single
ended with 20.4 M spots and 46.4% GC content. Raw
sequence was converted in to fastq file format for
further annotation with the use of SRA TOOL KIT
from NCBI. (http://trace.ncbi.nlm.nih.gov/Traces/
sra/sra.cgi?view=software).
1.2 NGS QC Toolkit
NGS QC Toolkit, it is an application for quality check
and filtering of high-quality data. This toolkit is a
standalone and open source application freely
available at http://www.nipgr.res.in/ngsqctoolkit.html.
The toolkit is comprised of user-friendly tools for QC
of sequencing data generated using Roche 454 and
Illumina platforms, and additional tools to aid QC
(sequence format converter and trimming tools) and
analysis (statistics tools). A variety of options have
been provided to facilitate the QC at user-defined
parameters. The toolkit is expected to be very useful
for the QC of NGS data to facilitate better
downstream analysis (Patel RK, et al).
1.3 De novo sequence assembly by CLC
GENOMICS WORKBENCH 7
A comprehensive and user-friendly analysis package
for analyzing, comparing, and visualizing next
generation sequencing data. This package was used
for de novo sequence assembly of sequence with by
default parameters of de novo assembly tool
(http://www.clcbio.com/products/clc-genomics-workb
ench/).
1.4 BLASTX
The assembled file was further considered for
annotation in which first step was to identify
translated protein sequences from contigs. BLASTX
at NCBI (http://www.ncbi.nlm.nih.gov/blast/Blast.
cgi?PROGRAM=blastx&PAGE_TYPE=BlastSearch
&LINK_LOC=blasthome) performed with changing
few parameters like non redundant protein database
(nr) selected as Database;
Eudicots
selected in organism
option and in Algorithm parameters Max target
Sequences set to 10 and Expect threshold set to 6.
1.5 Blast2GO
Blast2GO is an ALL in ONE tool for functional
annotation of (novel) sequences and the analysis of
annotation data (http://www.blast2go.com/b2ghome).
Based on the results of the protein database annotation,
Blast2GO was employed to obtain the functional
classification of the unigenes based on GO terms. The
transcript contigs were classified under three GO
terms such as molecular function, cellular process and
biological process (Ness et al., 2011; Shi et al., 2011;
Wang et al., 2010). WEGO (http://www.wego.
genomics.org.cn) tool was used to perform the GO
functional classification for all of the unigenes and to
understand the distribution of the gene functions of
this species at the macro level. The KEGG database
(http://www.genome.jp/kegg/pathway.html) was used
to annotate the pathway of these unigenes.
1.6 SSR mining
We employed MIcroSAtellite (MISA) (http://pgrc.ipk-
gatersleben.de/misa/) for microsatellite mining which
gives various statistical outputs of transcripts with
useful information.
1.7 Plant transcription factor
PlantTFcat: An Online Plant Transcription Factor and
Transcriptional Regulator Categorization and Analysis
Tool used for identifying plant transcription factor in
sequences (http://plantgrn.noble.org/PlantTFcat/).
2 Result and Discussions
2.1 NGS QC Toolkit
Sequence was filtered with this tool by removing
adaptors and other contaminated materials then quality
of sequence also checked with this tool and finally
high quality filter sequence file considered for de novo
sequence assembly (Table 1).
Table 1 NGS QC Toolkit Result
File Details
Original File High Quality (HQ)
Filter file
Total number of reads
20444892
13418027
Total number of bases
1042689492 684319377
Percentage of HQ reads
--
65.63%