Computational Molecular Biology
19
Tea (
Camellia sinensis
) is one of the most popular
beverages in the world owing to the availability of
diverse cultivars and qualities, its taste, the stimulative
effect, but also for its health benefits. It has received
much attention for its attractive aroma, pleasant taste,
and numerous medicinal benefits, and has been
socially and habitually consumed by people since
3000 B.C. (Kliman and Bernal, 2005). Many
secondary metabolites, such as polyphenols, alkaloids
(e.g. caffeine), vitamins (A, B1, B2, E, C),
polysaccharides, volatile oils, and minerals are found
in tea leaves (Lin et al., 2003).
Despite its fundamental importance in several areas of
genetics, there has been a long period of struggle to
measure CUB. Advances in sequencing technology
have provided an abundance of genomic data from
different organisms. The study of CUB is
gaining rehabilitated attention with the advent
of whole genome sequencing of numerous
organisms. In this paper, we propose to study the
CUB for
Camellia sinensis
by analyzing the
codon adaptation index (CAI), relative codon usage
bias (RCBS), frequency of optimal codon (Fop),
relative synonymous codon usage value (RSCU),
effective number of codons (ENc), GC content,
GC skew and AT skew.
1 Results
1.1 Overall codon usage analysis
Since the whole genome sequence is not available for
Camellia sinensis
only ten (10) genes were used in
this study. Table 1 shows the selected genes with their
accession number along with the overall RCBS, CAI,
GC%, GC1s, GC2s and GC3s values. It was found
that the coding sequences of
Camellia sinensis
are
rich in A and/or T. But in the case of
P. aeruginosa
it
is evident that codons ending in G and/or C are
predominant in the entire coding region (Gupta and
Ghosh, 2001). However, the overall codon usage
values may obscure some heterogeneity of codon
usage bias among the genes that might be
superimposed on the extreme genomic composition of
this organism.
Table 1 RCBS, CAI, CDS length, GC content analysis and accession number for
Camellia sinensis
genes
Sl.
no Gene name
Accession
number
CDS
length (bp) CAI
RCBS
GC content (%)
GC GC1 GC2 GC3
1
Acetyl CoA carboxylase
DQ366599 1800
0.6351
0.0413 47.6 55.0 42.3 45.3
2
Polyphenol oxidase (PPO)
FJ656220
1800
0.6118
0.0408 49.1 52.3 40.5 54.3
3
pRB mRNA for retinoblastoma
related protein
AB247284 3078
0.5455
0.0252 42.9 47.9 43.9 37.1
4
cycD3-2 mRNA for cyclin D3-2 AB247283 1119
0.3605
0.0637 43.3 50.4 34.3 45.0
5
cycD3 mRNA for cyclin D3-1 AB247282 1116
0.5152
0.0629 46.7 53.0 38.2 48.9
6
cycb mRNA for cyclin B
AB247280 1323
0.4688
0.0541 45.4 53.7 39.5 43.1
7
Stearoyl acyl carrier protein
desaturase
KC242133 1191
0.3339
0.0599 45.3 54.2 37.8 44.1
8
Cultivar
Longjing43 glyc-
erol-3-phosphate acyltransferase
KC920896 1353
0.4935
0.0529 46.4 52.5 42.1 44.6
9
Omega-3 fatty acid desaturase
(FAD8)
KC847167 1359
0.5694
0.0536 46.7 53.0 42.2 44.8
10 AMP deaminese
KC700025 2571
0.5943
0.0298 44.2 52.5 37.5 42.6
1.2 Codon usage variation
The effective number of codons used by a gene (Nc)
and (G+C) percentage at the third synonymous
codon positions (GC3s) are used to study the
codon usage variation among the genes of
Camellia sinensis
. Wright (1990) suggested that a
plot of Nc against GC3s could be effectively used
to explore the codon usage variation among the
genes. It was demonstrated by Wright (1990) that
the comparison of the actual distribution of genes with
the expected distribution under no selection could be
indicative, if the codon usage bias of the genes had
some influence other than the genomic GC
composition.
Figure 1 shows the Nc distribution of different genes
in
Camellia sinensis
. The mean and standard deviation
value of Nc are 15.2 and 0.42637 respectively,
Computational
Molecular Biology