Computational Molecular Biology
            
            
              19
            
            
              Tea (
            
            
              
                Camellia sinensis
              
            
            
              ) is one of the most popular
            
            
              beverages in the world owing to the availability of
            
            
              diverse cultivars and qualities, its taste, the stimulative
            
            
              effect, but also for its health benefits. It has received
            
            
              much attention for its attractive aroma, pleasant taste,
            
            
              and numerous medicinal benefits, and has been
            
            
              socially and habitually consumed by people since
            
            
              3000 B.C. (Kliman and Bernal, 2005). Many
            
            
              secondary metabolites, such as polyphenols, alkaloids
            
            
              (e.g. caffeine), vitamins (A, B1, B2, E, C),
            
            
              polysaccharides, volatile oils, and minerals are found
            
            
              in tea leaves (Lin et al., 2003).
            
            
              Despite its fundamental importance in several areas of
            
            
              genetics, there has been a long period of struggle to
            
            
              measure CUB. Advances in sequencing technology
            
            
              have provided an abundance of genomic data from
            
            
              different organisms. The study of CUB is
            
            
              gaining rehabilitated attention with the advent
            
            
              of whole genome sequencing of numerous
            
            
              organisms. In this paper, we propose to study the
            
            
              CUB for
            
            
              
                Camellia sinensis
              
            
            
              by analyzing the
            
            
              codon adaptation index (CAI), relative codon usage
            
            
              bias (RCBS), frequency of optimal codon (Fop),
            
            
              relative synonymous codon usage value (RSCU),
            
            
              effective number of codons (ENc), GC content,
            
            
              GC skew and AT skew.
            
            
              
                1 Results
              
            
            
              
                1.1 Overall codon usage analysis
              
            
            
              Since the whole genome sequence is not available for
            
            
              
                Camellia sinensis
              
            
            
              only ten (10) genes were used in
            
            
              this study. Table 1 shows the selected genes with their
            
            
              accession number along with the overall RCBS, CAI,
            
            
              GC%, GC1s, GC2s and GC3s values. It was found
            
            
              that the coding sequences of
            
            
              
                Camellia sinensis
              
            
            
              are
            
            
              rich in A and/or T. But in the case of
            
            
              
                P. aeruginosa
              
            
            
              it
            
            
              is evident that codons ending in G and/or C are
            
            
              predominant in the entire coding region (Gupta and
            
            
              Ghosh, 2001). However, the overall codon usage
            
            
              values may obscure some heterogeneity of codon
            
            
              usage bias among the genes that might be
            
            
              superimposed on the extreme genomic composition of
            
            
              this organism.
            
            
              Table 1 RCBS, CAI, CDS length, GC content analysis and accession number for
            
            
              
                Camellia sinensis
              
            
            
              genes
            
            
              Sl.
            
            
              no Gene name
            
            
              Accession
            
            
              number
            
            
              CDS
            
            
              length (bp) CAI
            
            
              RCBS
            
            
              GC content (%)
            
            
              GC GC1 GC2 GC3
            
            
              1
            
            
              Acetyl CoA carboxylase
            
            
              DQ366599 1800
            
            
              0.6351
            
            
              0.0413 47.6 55.0 42.3 45.3
            
            
              2
            
            
              Polyphenol oxidase (PPO)
            
            
              FJ656220
            
            
              1800
            
            
              0.6118
            
            
              0.0408 49.1 52.3 40.5 54.3
            
            
              3
            
            
              pRB mRNA for retinoblastoma
            
            
              related protein
            
            
              AB247284 3078
            
            
              0.5455
            
            
              0.0252 42.9 47.9 43.9 37.1
            
            
              4
            
            
              cycD3-2 mRNA for cyclin D3-2 AB247283 1119
            
            
              0.3605
            
            
              0.0637 43.3 50.4 34.3 45.0
            
            
              5
            
            
              cycD3 mRNA for cyclin D3-1 AB247282 1116
            
            
              0.5152
            
            
              0.0629 46.7 53.0 38.2 48.9
            
            
              6
            
            
              cycb mRNA for cyclin B
            
            
              AB247280 1323
            
            
              0.4688
            
            
              0.0541 45.4 53.7 39.5 43.1
            
            
              
                7
              
            
            
              
                Stearoyl acyl carrier protein
              
            
            
              
                desaturase
              
            
            
              KC242133 1191
            
            
              0.3339
            
            
              0.0599 45.3 54.2 37.8 44.1
            
            
              8
            
            
              Cultivar
            
            
              Longjing43 glyc-
            
            
              erol-3-phosphate acyltransferase
            
            
              KC920896 1353
            
            
              0.4935
            
            
              0.0529 46.4 52.5 42.1 44.6
            
            
              9
            
            
              Omega-3 fatty acid desaturase
            
            
              (FAD8)
            
            
              KC847167 1359
            
            
              0.5694
            
            
              0.0536 46.7 53.0 42.2 44.8
            
            
              10 AMP deaminese
            
            
              KC700025 2571
            
            
              0.5943
            
            
              0.0298 44.2 52.5 37.5 42.6
            
            
              
                1.2 Codon usage variation
              
            
            
              The effective number of codons used by a gene (Nc)
            
            
              and (G+C) percentage at the third synonymous
            
            
              codon positions (GC3s) are used to study the
            
            
              codon usage variation among the genes of
            
            
              
                Camellia sinensis
              
            
            
              . Wright (1990) suggested that a
            
            
              plot of Nc against GC3s could be effectively used
            
            
              to explore the codon usage variation among the
            
            
              genes. It was demonstrated by Wright (1990) that
            
            
              the comparison of the actual distribution of genes with
            
            
              the expected distribution under no selection could be
            
            
              indicative, if the codon usage bias of the genes had
            
            
              some influence other than the genomic GC
            
            
              composition.
            
            
              Figure 1 shows the Nc distribution of different genes
            
            
              in
            
            
              
                Camellia sinensis
              
            
            
              . The mean and standard deviation
            
            
              value of Nc are 15.2 and 0.42637 respectively,
            
            
              Computational
            
            
              Molecular Biology