GAB-2015v6n2 - page 5

image
Genomics and Applied Biology 2015, Vol. 6, No. 2, 1-7
http://gab.biopublisher.ca
2
Collins et al.).
Methods
1. Sequence Retrieval:
This study is focus on the de novo assembly and
sequence annotation of
Vicia sativa
L. of
SRR403901
from NCBI database. Raw data downloaded from
NCBI SRA (http://trace.ncbi.nlm.nih.gov/Traces/sra/?
run= SRR403901) which is from Illumina HiSeq 2000
platform and the sample is single ended with 12.4 M
spots and 42.4% GC content. Raw sequence was
converted in to fastq file format for further annotation
with the use of SRA TOOL KIT from NCBI.
(http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?view=
software)
2. NGS QC Toolkit
NGS QC Toolkit, it is an application for quality check
and filtering of high-quality data. This toolkit is a
standalone and open source application freely
available at http://www.nipgr.res.in/ngsqctoolkit.html.
The toolkit is comprised of user-friendly tools for QC
of sequencing data generated using Roche 454 and
Illumina platforms, and additional tools to aid QC
(sequence format converter and trimming tools) and
analysis (statistics tools). A variety of options have
been provided to facilitate the QC at user-defined
parameters. The toolkit is expected to be very useful
for the QC of NGS data to facilitate better
downstream analysis (Patel RK, et al).
3. De novo sequence assembly by CLC
GENOMICS WORKBENCH 7
A comprehensive and user-friendly analysis package
for analyzing, comparing, and visualizing next
generation sequencing data. This package was used
for de novo sequence assembly of sequence with by
default parameters of de novo assembly tool
(http://www.clcbio.com/products/clc-genomics-workb
ench/).
4. BLASTX
The assembled file was further considered for
annotation in which first step was to identify
translated protein sequences from contigs. BLASTX
at NCBI
PROGRAM=blastx&PAGE_TYPE=BlastSearch&LI
NK_LOC=blasthome) performed with changing few
parameters like non redundant protein database (nr)
selected as Database;
Eudicots
selected in organism
option and in Algorithm parameters Max target
Sequences set to 10 and Expect threshold set to 6.
5. Blast2GO
Blast2GO is an ALL in ONE tool for functional
annotation of (novel) sequences and the analysis of
annotation data (http://www.blast2go.com/b2ghome).
Based on the results of the protein database annotation,
Blast2GO was employed to obtain the functional
classification of the unigenes based on GO terms. The
transcript contigs were classified under three GO
terms such as molecular function, cellular process and
biological process (Ness et al., 2011; Shi et al., 2011;
Wang et al., 2010). WEGO (http://www.wego.genomi
cs.org.cn) tool was used to perform the GO functional
classification for all of the unigenes and to understand
the distribution of the gene functions of this species at
the
macro
level.
The
KEGG
database
(http://www.genome.jp/kegg/pathway.html) was used
to annotate the pathway of these unigenes.
6. SSR mining
We employed MIcroSAtellite (MISA) (http://pgrc.ipk
-gatersleben.de/misa/) for microsatellite mining which
gives various statistical outputs of transcripts with
useful information.
7. Plant transcription factor
PlantTFcat: An Online Plant Transcription Factor and
Transcriptional Regulator Categorization and Analysis
Tool used for identifying plant transcription factor in
sequences (http://plantgrn.noble.org/PlantTFcat/).
Results and Discussions
1. NGS QC Toolkit
Sequence was filtered with this tool by removing
1,2,3,4 6,7,8,9,10,11,12
Powered by FlippingBook