Computational Molecular Biology 2014, Vol. 4, No. 9, 1-6
http://cmb.biopublisher.ca
1
Research Article Open Access
Comparative study of five Legume species based on De Novo Sequence
Assembly and Annotation
Sagar S. Patel
1
, Dipti B. Shah
1
, Hetalkumar J. Panchal
2
1. G. H. Patel Post Graduate Department of Computer Science and Technology, Sardar Patel University, Vallabh Vidyanagar, Gujarat-388120, India
2. Gujarat Agricultural Biotechnology Institute, Navsari Agricultural University, Surat, Gujarat- 395007, India
Corresponding author email
Computational Molecular Biology, 2014, Vol.4, No.9 doi: 10.5376/cmb.2014.04.0009
Received: 03 Sep., 2014
Accepted: 25 Sep., 2014
Published: 23 Oct., 2014
© 2014
Patel et al., This is an open access article published under the terms of the Creative Commons Attribution License, which permits unrestricted use,
distribution, and reproduction in any medium, provided the original work is properly cited.
Preferred citation for this article:
Patel et al., 2014,
Comparative study of five Legume species based on De Novo Sequence Assembly and Annotation
,
Computational Molecular
Biology, Vol.4, No.9, 1-6
(doi
Abstract
Legume species are an important oilseed crop in tropical and subtropical regions of the world. Recently, next-generation
sequencing technology, termed RNA-seq, has provided a powerful approach for analysing the Transcriptome. This study is focus on
RNA-seq of five legume species which are
Arachis hypogaea
L.
(The peanut) of
SRR1212866, Cicer arietinum
L. of
SRR627764,
Phaseolus vulgaris
L. of
SRR1283084, Trigonella foenum-graecum
L. of
SRR066197
and
Vicia sativa
L. of
SRR403901
from NCBI
database. Comparative study focuses on various important features like; reads were generated with N50, sequence assembly contigs
which is further searched with known proteins and genes; among these, how many genes were annotated with gene ontology (GO)
functional categories and sequences mapped to pathways by searching against the Kyoto Encyclopedia of Genes and Genomes
pathway database (KEGG). These data will be useful for gene discovery and functional studies and the large number of transcripts
reported in the current study will serve as a valuable genetic resource of these five legume species.
Keywords
De Novo assembly; Bioinformatics; Legume species; Sequence Assembly and Annotation
Introduction
Next generation sequencing methods for high
throughput RNA sequencing (transcriptome) is
becoming increasingly utilized as the technology of
choice to detect and quantify known and novel
transcripts in plants. This Transcriptome analysis
method is fast and simple because it does not require
cloning of the cDNAs. Direct sequencing of these
cDNAs can generate short reads at an extraordinary
depth. After sequencing, the resulting reads can be
assembled into a genome-scale transcription profile. It
is a more comprehensive and efficient way to measure
Transcriptome composition, obtain RNA expression
patterns, and discovers new exons and genes
(Mortazavi et al., 2008; Wang et al.,2009); sequencing
data of Transcriptome was assembled using various
assembly tools, functional annotation of genes and
pathway analysis carried with various Bioinformatics
tools. The large number of transcripts reported in the
current study will serve as a valuable genetic resource
for described five legume species.
High-throughput short-read sequencing is one of the
latest sequencing technologies to be released to the
genomics community. For example, on average a
single run on the Illumina Genome Analyser can result
in over 30 to 40 million single-end (~35 nt) sequences.
However, the resulting output can easily overwhelm
genomic analysis systems designed for the length of
traditional Sanger sequencing, or even the smaller
volumes of data resulting from 454 (Roche)
sequencing technology. Typically, the initial use of
short-read sequencing was confined to matching data
from genomes that were nearly identical to the
reference genome. Transcriptome analysis on a global
gene expression level is an ideal application of
short-read sequencing. Traditionally such analysis
involved complementary DNA (cDNA) library
construction, Sanger sequencing of ESTs, and
microarray analysis. Next generation sequencing has
become a feasible method for increasing sequencing
depth and coverage while reducing time and cost
compared to the traditional Sanger method (L J
Collins et al.).