Page 6 - Legume Genomics and Genetics

Basic HTML Version

Legume Genomics and Genetics 2014, Vol.5, No.7, 1-7
http://lgg.biopublisher.ca
3
1.6 SSR mining
We employed MIcroSAtellite (MISA) (http://pgrc.ipk-
gatersleben.de/misa/) for microsatellite mining which
gives various statistical outputs of transcripts with
useful information.
1.7 Plant transcription factor
PlantTFcat: An Online Plant Transcription Factor and
Transcriptional Regulator Categorization and Analysis
Tool used for identifying plant transcription factor in
sequences (http://plantgrn.noble.org/PlantTFcat/).
Transcription factor encoding transcripts were
identified by sequence comparison to known
transcription factor gene families.
2 Result and Discussions
2.1 NGS QC Toolkit
Sequence was filtered with this tool by removing
adaptors and other contaminated materials then quality
of sequence also checked with this tool and finally
high quality filter sequence file considered for de novo
sequence assembly (Table 1).
Table 1 NGS QC Toolkit Result
File Details
Original File High Quality (HQ)
Filter file
Total number of reads
627117
609237
Total number of bases
146335656
141577237
Percentage of HQ reads
--
97.15%
2.2 De novo Sequence Assembly
CLC GENOMICS WORKBENCH 7 considered for
de novo sequence assembly with by default
parameters like Mismatch Cost=2, Insertion Cost=3,
Deletion Cost=3, Length Fraction=0.5, Similarity
Fraction=0.8, Word size=21 and finally 7256 contigs
generated with average value of 445 by this software
and other details are shown in Table 2.
Table 2 Contig measurement
Description
Length
N75
348
N50
470
N25
667
Minimum
86
Maximum
3231
Average
445
Count (Contigs)
7256
3 Functional annotation with BLASTX and
blast2GO
3.1 BLASTX
BLASTX was performed to align the contigs against
non-redundant sequences database using an E value
threshold of 10-6. Out of 7256 transcript contigs, 1983
were having BLAST hits to known proteins with high
significant similarity and 167 had no BLAST hits
(Table 3). Out of total transcripts contigs, Table 4 and
Figure 1 shows that species distribution in which 2515
sequences showed significant similarity with
Medicago truncatula
and least similarity was found
with
Solanum lycopersicum
(11).
Table 3 Blast Result
Without Blast Results
167
Without Blast Hits
2656
With Blast Results
1983
With Mapping Results
192
Annotated Sequences
2258
Total Sequences
7256
Table 4 Blast Result of Species Distribution
Species
Blast Hit
Medicago truncatula
2515
Cicer arietinum
923
Glycine max
184
Phaseolus vulgaris
97
Lotus japonicas
65
Medicago sativa
44
Pisum sativum
42
Vitis vinifera
38
Populus trichocarpa
30
Theobroma cacao
24
Ricinus communis
23
Morus notabilis
20
Erythranthe guttata
20
Trigonella foenum-graecum 19
Cucumis sativus
18
Prunus persica
16
Citrus clementina
16
Jatropha curcas
15
Trifolium pratense
15
Arabidopsis lyrata
14
Citrus sinensis
14
Eutrema salsugineum
14
Arabidopsis thaliana
13
Eucalyptus grandis
13
Vicia faba
12
Solanum tuberosum
12
Zea mays
12
Fragaria vesca
11
Solanum lycopersicum
11
others
183