Page 8 - Genomics and Applied Biology

Basic HTML Version

Genomics and Applied Biology, 2010, Vol.1 No.3
http://gab.sophiapublisher.com
- 25 -
covers 34 109 bp. Further analysis revealed that, in
poplars, the average exon lengths of genes are 1 176
bp, coding 392 amino acids. From this analysis,
we conclude that, on average, exon sequences
account 43.7% of the gene sequences. The total
length of transcripts (sequences covered by mRNAs)
were found to only account 12.5% of the valid
A,T,G,C readings of the assembled genome
sequences. There are about 90 thousand EST
sequences of P. trichocarpa deposite in GeneBank.
We summarize the gene models captured by these
ESTs and found that they only covered about 16.8%
of the total genes in poplar genome. The poor gene
coverage might relate to that EST sequences depend
a lot on the tissue library and normally contain
heavily redundant sequences (Susko and Roger,
2004; Wang et al., 2005).
2 Discussion
Most eukaryotic genomes have numerous duplicated
genes, many of which appear to have arisen from
one or more cycles of ancient polyploidy
(paleopolyploidy) (Adams and Wendel, 2005;
Adams et al., 2004; Blanc and Wolfe, 2004).
Following paleopolyploidy (genome doubling),
there is extensive loss of duplicated genes (Adams
and Wendel, 2005; Adams et al., 2004; Blanc and
Wolfe, 2004). Cytological studies revealed that
almost all Populus existed in the diploid form with a
haploid number of chromosomes equal to 19 (Smith,
1943). However, poplar genome sequencing project
revealed that the modern poplar genome arose from
an ancient whole genome duplication event, known
as “salicoid duplication”. Thus polar chromosomes
share high sequence homology (Tuskan et al., 2006).
However, the pattern of chromosomes overabundant,
even, or sparse with genes is different from the
homologous pattern among chromosomes as
revealed by the poplar genome sequencing project
(Tuskan et al., 2006). This suggests the loss of genes
occurs at different rate on chromosomes that share
large duplicated segments after the salicoid
duplication.
ESTs are generated by partially sequencing
randomly isolated gene transcripts that have been
converted into cDNA (Adams et al., 1991). EST
sequencing has played an important role in the
identification, discovery and characterization of
organisms as they provide an attractive and efficient
alternative to whole genome sequencing (Lijoi et al.,
2007). Concerted, high budget EST sequencing
projects have been carried out for many of the
economically and ecologically important plant
species. In high plants, the majority of their genome
sequences are consisted of noncoding sequences.
Gene sequences only account small parts of their
genomes. In poplar, the total transcript sequences
only account for 12.5% of the genome sequences.
Alignment analysis of the EST sequences from P.
trichocarpa deposited in GenBank revealed that over
100 thousand ESTs only cover 16.8% of the putative
gene models annotated in the poplar Vista browser.
In high organisms, such as mammalian and
Arabidopsis, even concerted, high-budget EST
sequencing can only get about 50%~60% of genes
(Pers. Comm.). Thus, genes captured by limited
number of ESTs are very incomplete. This is
because that, first, ESTs sequences are heavily
redundant; second, EST sequences are tissue type
dependent, in a particular tissue library, some genes
have high expression level and these genes would be
sequenced repetitively, but there are many other
genes with little chance to be sequenced. Although
EST sequencing is an attractive alternative to whole
genome sequencing for gene identification, the
power of small scale EST sequencing studies should
be properly evaluated since with limited effort and
budget, genes captured by ESTs are fairly limited.
In our analysis, we used gene models from the JGI
annotation database in Vista browser. These gene
models showed high conservation with the
Arabidopsis gene sets and are most likely “real”
genes. With percentage of unreliable annotations in
the current Jamboree models, we believe that it was
better to focus on conserved real genes. In our