Page 5 - Legume Genomics and Genetics

Legume Genomics and Genetics 2014, Vol.5, No.7, 1-7

http://lgg.biopublisher.ca

2

patterns, and discovers new exons and genes

(Mortazavi et al., 2008; Wang et al.,2009); sequencing

data of Transcriptome was assembled using various

assembly tools, functional annotation of genes and

pathway analysis carried with various Bioinformatics

tools. The large number of transcripts reported in the

current study will serve as a valuable genetic resource

for

Trigonella foenum-graecum L.

High-throughput short-read sequencing is one of the

latest sequencing technologies to be released to the

genomics community. For example, on average a

single run on the Illumina Genome Analyser can result

in over 30 to 40 million single-end (~35 nt) sequences.

However, the resulting output can easily overwhelm

genomic analysis systems designed for the length of

traditional Sanger sequencing, or even the smaller

volumes of data resulting from 454 (Roche)

sequencing technology. Typically, the initial use of

short-read sequencing was confined to matching data

from genomes that were nearly identical to the

reference genome. Transcriptome analysis on a global

gene expression level is an ideal application of

short-read sequencing. Traditionally such analysis

involved complementary DNA (cDNA) library

construction, Sanger sequencing of ESTs, and

microarray analysis. Next generation sequencing has

become a feasible method for increasing sequencing

depth and coverage while reducing time and cost

compared to the traditional Sanger method (L J

Collins et al., 2008).

1 Methods

1.1 Sequence Retrieval:

This study is focus on the de novo assembly and

sequence annotation of

Trigonella foenum-graecum L.

of

SRR066197

from NCBI database. Raw data

downloaded from NCBI SRA (http://trace.ncbi.nlm.

nih.gov/Traces/sra/?run= SRR066197) which is from

LS454 platform- 454 GS FLX and the sample is single

ended with 627,117 spots and 45.2% GC content. Raw

sequence was converted in to fasta file format for

further annotation by using SRA TOOL KIT from

NCBI. (http://trace.ncbi.nlm.nih.gov/Traces/sra/sra.cgi?

view=software)

1.2 NGS QC Toolkit

It is an application for quality check and filtering of

high-quality data. This toolkit is a standalone and

open source application freely available at

http://www.nipgr.res.in/ngsqctoolkit.html. The toolkit

is comprised of user-friendly tools for QC of

sequencing data generated using Roche 454 and

Illumina platforms, and additional tools to aid QC

(sequence format converter and trimming tools) and

analysis (statistics tools). A variety of options have

been provided to facilitate the QC at user-defined

parameters. The toolkit is expected to be very useful

for the QC of NGS data to facilitate better

downstream analysis (Patel RK, et al, 2011).

1.3 De novo sequence assembly by CLC

GENOMICS WORKBENCH 7

A comprehensive and user-friendly analysis package

for analyzing, comparing, and visualizing next

generation sequencing data. This package was used

for de novo sequence assembly of sequence with by

default parameters of de novo assembly tool

(http://www.clcbio.com/products/clc-genomics-workb

ench/).

1.4 BLASTX

The assembled file was further considered for

annotation in which first step was to identify

translated protein sequences from contigs. BLASTX

at NCBI (http://www.ncbi.nlm.nih.gov/blast/Blast.cgi?

PROGRAM=blastx&PAGE_TYPE=BlastSearch&L

INK_LOC=blasthome) performed with changing few

parameters like non redundant protein database (nr)

selected as Database;

Eudicots

selected in organism

option and in Algorithm parameters Max target

Sequences set to 10 and Expect threshold set to 6.

1.5 Blast2GO

Blast2GO is an ALL in ONE tool for functional

annotation of (novel) sequences and the analysis of

annotation data (http://www.blast2go.com/b2ghome).

Based on the results of the protein database annotation,

Blast2GO was employed to obtain the functional

classification of the unigenes based on GO terms. The

transcript contigs were classified under three GO

terms such as molecular function, cellular process and

biological process (Ness et al., 2011; Shi et al., 2011;

Wang et al., 2010). WEGO (http://www.wego.

genomics.org.cn) tool was used to perform the GO

functional classification for all of the unigenes and to

understand the distribution of the gene functions of

this species at the macro level. The KEGG database

(http://www.genome.jp/kegg/pathway.html) was used

to annotate the pathway of these unigenes.