5 - CMB-2014v4n7页

Computational Molecular Biology 2014, Vol. 4, No. 9, 1-6

http://cmb.biopublisher.ca

1

Research Article Open Access

Comparative study of five Legume species based on De Novo Sequence

Assembly and Annotation

Sagar S. Patel

1

, Dipti B. Shah

1

, Hetalkumar J. Panchal

2

1. G. H. Patel Post Graduate Department of Computer Science and Technology, Sardar Patel University, Vallabh Vidyanagar, Gujarat-388120, India

2. Gujarat Agricultural Biotechnology Institute, Navsari Agricultural University, Surat, Gujarat- 395007, India

Corresponding author email

:

sgr308@gmail.com

Computational Molecular Biology, 2014, Vol.4, No.9 doi: 10.5376/cmb.2014.04.0009

Received: 03 Sep., 2014

Accepted: 25 Sep., 2014

Published: 23 Oct., 2014

Patel et al., This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use,

distribution, and reproduction in any medium, provided the original work is properly cited.

Preferred citation for this article:

Patel et al., 2014,

Comparative study of five Legume species based on De Novo Sequence Assembly and Annotation

,

Computational Molecular

Biology, Vol.4, No.9, 1-6

(doi

:

10.5376/cmb.2014.04.0009

)

Abstract

Legume species are an important oilseed crop in tropical and subtropical regions of the world. Recently, next-generation

sequencing technology, termed RNA-seq, has provided a powerful approach for analysing the Transcriptome. This study is focus on

RNA-seq of five legume species which are

Arachis hypogaea

L.

(The peanut) of

SRR1212866, Cicer arietinum

L. of

SRR627764,

Phaseolus vulgaris

L. of

SRR1283084, Trigonella foenum-graecum

L. of

SRR066197

and

Vicia sativa

L. of

SRR403901

from NCBI

database. Comparative study focuses on various important features like; reads were generated with N50, sequence assembly contigs

which is further searched with known proteins and genes; among these, how many genes were annotated with gene ontology (GO)

functional categories and sequences mapped to pathways by searching against the Kyoto Encyclopedia of Genes and Genomes

pathway database (KEGG). These data will be useful for gene discovery and functional studies and the large number of transcripts

reported in the current study will serve as a valuable genetic resource of these five legume species.

Keywords

De Novo assembly; Bioinformatics; Legume species; Sequence Assembly and Annotation

Introduction

Next generation sequencing methods for high

throughput RNA sequencing (transcriptome) is

becoming increasingly utilized as the technology of

choice to detect and quantify known and novel

transcripts in plants. This Transcriptome analysis

method is fast and simple because it does not require

cloning of the cDNAs. Direct sequencing of these

cDNAs can generate short reads at an extraordinary

depth. After sequencing, the resulting reads can be

assembled into a genome-scale transcription profile. It

is a more comprehensive and efficient way to measure

Transcriptome composition, obtain RNA expression

patterns, and discovers new exons and genes

(Mortazavi et al., 2008; Wang et al.,2009); sequencing

data of Transcriptome was assembled using various

assembly tools, functional annotation of genes and

pathway analysis carried with various Bioinformatics

tools. The large number of transcripts reported in the

current study will serve as a valuable genetic resource

for described five legume species.

High-throughput short-read sequencing is one of the

latest sequencing technologies to be released to the

genomics community. For example, on average a

single run on the Illumina Genome Analyser can result

in over 30 to 40 million single-end (~35 nt) sequences.

However, the resulting output can easily overwhelm

genomic analysis systems designed for the length of

traditional Sanger sequencing, or even the smaller

volumes of data resulting from 454 (Roche)

sequencing technology. Typically, the initial use of

short-read sequencing was confined to matching data

from genomes that were nearly identical to the

reference genome. Transcriptome analysis on a global

gene expression level is an ideal application of

short-read sequencing. Traditionally such analysis

involved complementary DNA (cDNA) library

construction, Sanger sequencing of ESTs, and

microarray analysis. Next generation sequencing has

become a feasible method for increasing sequencing

depth and coverage while reducing time and cost

compared to the traditional Sanger method (L J

Collins et al.).