Page 5 - ME-436-v3-3

Computational Molecular Biology

19

Tea (

Camellia sinensis

) is one of the most popular

beverages in the world owing to the availability of

diverse cultivars and qualities, its taste, the stimulative

effect, but also for its health benefits. It has received

much attention for its attractive aroma, pleasant taste,

and numerous medicinal benefits, and has been

socially and habitually consumed by people since

3000 B.C. (Kliman and Bernal, 2005). Many

secondary metabolites, such as polyphenols, alkaloids

(e.g. caffeine), vitamins (A, B1, B2, E, C),

polysaccharides, volatile oils, and minerals are found

in tea leaves (Lin et al., 2003).

Despite its fundamental importance in several areas of

genetics, there has been a long period of struggle to

measure CUB. Advances in sequencing technology

have provided an abundance of genomic data from

different organisms. The study of CUB is

gaining rehabilitated attention with the advent

of whole genome sequencing of numerous

organisms. In this paper, we propose to study the

CUB for

Camellia sinensis

by analyzing the

codon adaptation index (CAI), relative codon usage

bias (RCBS), frequency of optimal codon (Fop),

relative synonymous codon usage value (RSCU),

effective number of codons (ENc), GC content,

GC skew and AT skew.

1 Results

1.1 Overall codon usage analysis

Since the whole genome sequence is not available for

Camellia sinensis

only ten (10) genes were used in

this study. Table 1 shows the selected genes with their

accession number along with the overall RCBS, CAI,

GC%, GC1s, GC2s and GC3s values. It was found

that the coding sequences of

Camellia sinensis

are

rich in A and/or T. But in the case of

P. aeruginosa

it

is evident that codons ending in G and/or C are

predominant in the entire coding region (Gupta and

Ghosh, 2001). However, the overall codon usage

values may obscure some heterogeneity of codon

usage bias among the genes that might be

superimposed on the extreme genomic composition of

this organism.

Table 1 RCBS, CAI, CDS length, GC content analysis and accession number for

Camellia sinensis

genes

Sl.

no Gene name

Accession

number

CDS

length (bp) CAI

RCBS

GC content (%)

GC GC1 GC2 GC3

1

Acetyl CoA carboxylase

DQ366599 1800

0.6351

0.0413 47.6 55.0 42.3 45.3

2

Polyphenol oxidase (PPO)

FJ656220

1800

0.6118

0.0408 49.1 52.3 40.5 54.3

3

pRB mRNA for retinoblastoma

related protein

AB247284 3078

0.5455

0.0252 42.9 47.9 43.9 37.1

4

cycD3-2 mRNA for cyclin D3-2 AB247283 1119

0.3605

0.0637 43.3 50.4 34.3 45.0

5

cycD3 mRNA for cyclin D3-1 AB247282 1116

0.5152

0.0629 46.7 53.0 38.2 48.9

6

cycb mRNA for cyclin B

AB247280 1323

0.4688

0.0541 45.4 53.7 39.5 43.1

7

Stearoyl acyl carrier protein

desaturase

KC242133 1191

0.3339

0.0599 45.3 54.2 37.8 44.1

8

Cultivar

Longjing43 glyc-

erol-3-phosphate acyltransferase

KC920896 1353

0.4935

0.0529 46.4 52.5 42.1 44.6

9

Omega-3 fatty acid desaturase

(FAD8)

KC847167 1359

0.5694

0.0536 46.7 53.0 42.2 44.8

10 AMP deaminese

KC700025 2571

0.5943

0.0298 44.2 52.5 37.5 42.6

1.2 Codon usage variation

The effective number of codons used by a gene (Nc)

and (G+C) percentage at the third synonymous

codon positions (GC3s) are used to study the

codon usage variation among the genes of

Camellia sinensis

. Wright (1990) suggested that a

plot of Nc against GC3s could be effectively used

to explore the codon usage variation among the

genes. It was demonstrated by Wright (1990) that

the comparison of the actual distribution of genes with

the expected distribution under no selection could be

indicative, if the codon usage bias of the genes had

some influence other than the genomic GC

composition.

Figure 1 shows the Nc distribution of different genes

in

Camellia sinensis

. The mean and standard deviation

value of Nc are 15.2 and 0.42637 respectively,

Computational

Molecular Biology