Page 9 - ME-436-v3-3

Basic HTML Version

Computational Molecular Biology
23
the base compositions are biased at three sites in the
sequence, divided by the expected frequency. RCBS is
the overall score of a gene indicating the influence of
RCB of each codon in a gene. RCB reflects the level
of gene expression. The expression measure of a gene
is denoted by RCBS (Hertog et al., 1993). RCBS
value close to 0 indicates a lack of bias for the
codons and is thus useful for comparing different
sets of genes.
Gene expression level is related to codon usage
difference of a gene with respect to biased nucleotide
composition at the three codon sites. Let f(x,y,z) be
the normalized codon frequency for the codon triplet
(x,y,z) of a gene. Then the relative codon bias (RCB)
of a codon triplet (x,y,z) in a gene is defined as:
Where, f1(x) is the normalized frequency of x at the
first codon position, f2(y) is the normalized frequency
of y at the second codon position, and f3(z) is the
normalized frequency of z at the third codon position
of the gene. The frequencies f1, f2, f3 have been
derived from the set of codon samples of a gene and
the normalization of frequency is done over the gene
length in codons, in an attempt to compensate for the
expected increase of RCB with the total number of
codons quantified the degree of codon bias of a gene
in such a way that comparisons can be made both
within and between genomes. As defined earlier, d
xyz
contains somewhat more quantitative information than
others, since it considers codon usage as well as the
base compositional bias. Then the expression measure
of a gene is:
Where, d
i xyz
is the codon usage difference of ith codon
of a gene. L is the number of codons in the gene.
Gene expressivity was again measured by calculating
the parameter codon adaptation index (Sharp and Li,
1986). It essentially measures the distance from a
given gene to a reference gene with respect to their
amino-acid codon usages. CAI defines translational
optimal codons as those that appear frequently in
highly expressed genes i.e.
Where,
L
is the length of gene g and w
c
(
l
) is the
relative adaptiveness of the codon
c
in the reference
genes (not g). Relative adaptiveness is defined:
Where, fc is the frequency of codon c which is the
l
th
codon in gene g. a is the amino acid encoded by
c
and
{
C
a
} is the set of synonymous codons encoding amino
acid
a
. Certain codons will appear multiple times in
the gene. Hence we can rewrite the equation to sum
over codons rather than length, and use counts rather
than frequencies. This makes the dependence on the
actual gene clearer. The more usual form is:
The effective number of codons (Nc) is the total
number of different codons used in a sequence
(Wright, 1990). The values of Nc range from 20,
where only one codon is used per amino acid, to 61
(for standard genetic code), where all possible
synonyms codons are used with equal frequency. Nc
measures bias toward the use of a smaller subset of
codons, away from equal use of synonymous codons.
For example, as mentioned above, highly expressed
genes use fewer codons due to selection. The
underlying idea of Nc is similar to the concept of
zygosity from population genetics, which refers to the
similarity for a gene from two organisms.
In the context of codon usage, multiple synonymous
codons are treated analogously to multiple alleles.
Homozygosity for an amino acid Za measures the
degree of similarity and is computed based on the
relative codon frequencies fac:
The number of effective codons for an amino acid is
the inverse of homozygosity:
)(3)(2)(1
)(3)(2)(1 ), ,(
z fy fx f
z fy fx f zyxf
dxyz
1
)
1(
(
/1
1
L
L
i
i
xyz
d
RCBS
L
L
l
lc
L
l
lc
w
L
w
gL CAI
/1
1
)(
1
)(
)
( )
log
1
exp(
)) ((
 
}, {8,
)
max(
a
s
c
c
c
f
f
w
tot
o
Cc
c
c
Cc
c
c
tot
w o
w o
o
go CAI
1
)
log
( )
log
1 exp(
)) ((
 
1
1
2
 
a
Ca c
ac
a
a
O
O
Z
Computational
Molecular Biology