Page 5 - Legume Genomics and Genetics

Basic HTML Version

Legume Genomics and Genetics 2014, Vol.5, No.7, 1-7
patterns, and discovers new exons and genes
(Mortazavi et al., 2008; Wang et al.,2009); sequencing
data of Transcriptome was assembled using various
assembly tools, functional annotation of genes and
pathway analysis carried with various Bioinformatics
tools. The large number of transcripts reported in the
current study will serve as a valuable genetic resource
Trigonella foenum-graecum L.
High-throughput short-read sequencing is one of the
latest sequencing technologies to be released to the
genomics community. For example, on average a
single run on the Illumina Genome Analyser can result
in over 30 to 40 million single-end (~35 nt) sequences.
However, the resulting output can easily overwhelm
genomic analysis systems designed for the length of
traditional Sanger sequencing, or even the smaller
volumes of data resulting from 454 (Roche)
sequencing technology. Typically, the initial use of
short-read sequencing was confined to matching data
from genomes that were nearly identical to the
reference genome. Transcriptome analysis on a global
gene expression level is an ideal application of
short-read sequencing. Traditionally such analysis
involved complementary DNA (cDNA) library
construction, Sanger sequencing of ESTs, and
microarray analysis. Next generation sequencing has
become a feasible method for increasing sequencing
depth and coverage while reducing time and cost
compared to the traditional Sanger method (L J
Collins et al., 2008).
1 Methods
1.1 Sequence Retrieval:
This study is focus on the de novo assembly and
sequence annotation of
Trigonella foenum-graecum L.
from NCBI database. Raw data
downloaded from NCBI SRA (http://trace.ncbi.nlm. SRR066197) which is from
LS454 platform- 454 GS FLX and the sample is single
ended with 627,117 spots and 45.2% GC content. Raw
sequence was converted in to fasta file format for
further annotation by using SRA TOOL KIT from
1.2 NGS QC Toolkit
It is an application for quality check and filtering of
high-quality data. This toolkit is a standalone and
open source application freely available at The toolkit
is comprised of user-friendly tools for QC of
sequencing data generated using Roche 454 and
Illumina platforms, and additional tools to aid QC
(sequence format converter and trimming tools) and
analysis (statistics tools). A variety of options have
been provided to facilitate the QC at user-defined
parameters. The toolkit is expected to be very useful
for the QC of NGS data to facilitate better
downstream analysis (Patel RK, et al, 2011).
1.3 De novo sequence assembly by CLC
A comprehensive and user-friendly analysis package
for analyzing, comparing, and visualizing next
generation sequencing data. This package was used
for de novo sequence assembly of sequence with by
default parameters of de novo assembly tool
The assembled file was further considered for
annotation in which first step was to identify
translated protein sequences from contigs. BLASTX
at NCBI (
INK_LOC=blasthome) performed with changing few
parameters like non redundant protein database (nr)
selected as Database;
selected in organism
option and in Algorithm parameters Max target
Sequences set to 10 and Expect threshold set to 6.
1.5 Blast2GO
Blast2GO is an ALL in ONE tool for functional
annotation of (novel) sequences and the analysis of
annotation data (
Based on the results of the protein database annotation,
Blast2GO was employed to obtain the functional
classification of the unigenes based on GO terms. The
transcript contigs were classified under three GO
terms such as molecular function, cellular process and
biological process (Ness et al., 2011; Shi et al., 2011;
Wang et al., 2010). WEGO (http://www.wego. tool was used to perform the GO
functional classification for all of the unigenes and to
understand the distribution of the gene functions of
this species at the macro level. The KEGG database
( was used
to annotate the pathway of these unigenes.