Computational Molecular Biology 2015, Vol.5, No.6, 1-4
2
resistance protein, etc.), and thus are referred to as
pathogenicity islands. It has also been reported that
several biotic factors has influence of pathogenicity of
bacteria and on the genomic features of pathogenic
genes.
1 Methodology
The coding sequences of whole genome of vibrio
cholera N16961 retrieved from the NCBI ftp site
and the virulence
gene set are downloaded from the pathogenic Island
Database
.
Genes under the gene set of virulence are
eliminated from the whole genome CDS to avoid
the recurrence of CDS in both the gene set.
Codon compositions (A3s, T3s, G3s, C3s, and
GC3s) of virulence gene set are obtained using
software Codon W (written by John Peden) and
taken from (fttp://molbiol.ox.ac.uk/cu/codonW.tar.Z/).
The nucleotide content (A%, T%, G%, and C%) of
each cds of VGS was analyzed using the MEGA 4.0
biosoftware for windows. The obtained data are
further analyzed statistically using statistical software
to get the values of statistical measurement. We
also measure the extent of codon usage bias (NC
diff
) in
this bacterial genome. To measure the NC
diff
we have
downloaded all the ribosomal protein coding genes
from the NCBI ftp site, we generate two set of coding
sequence to evaluate NC
diff
i.e ribosomal protein
coding gene and rest of the genes. Using codon W
software we have analyzed the ENC value of
ribosomal genes and rest of the genes.
2 Result and Discussion
To investigate whether there is any possible influence
of mutational pressure on the codon usage bias in the
VGS the correlation analysis was performed between
the composition at different codon position (A3, T3,
G3, C3, and GC3), the nucleotide compositions (A%,
T%, G%,C%, and GC%) and ENC values(Table 1).
The results indicate that most of the codon
compositions correlated with the nucleotide
compositions. Additionally, ENC value always shows
no correlation with the nucleotide compositions.
These results confirmed the codon usage bias of the
VGS was influenced by the nucleotide compositions,
and hence by mutational bias.
Table 1: The correlation between the codon compositions (A3s,
T3s, G3s, C3s, and GC3s), the ENC values, nucleotide
compositions (A%, T%, G%, C%, and GC%) value of the
coding sequence of virulent genes.
Note: * 0.01 < P < 0.05
We also make the GC12 VS GC3 plot of both the CDS
set i.e. whole genome CDS and VGS CDS in both the
case the correlation values are more or less similar
indicating that mutational pressure influencing both
gene sets similarly. GC12 is the average value of GC1
and GC2 and GC3 is plotted against this average value
and find out the correlation to predict whether there is
any difference between mutational forces shaping
codon usage bias in both the CDS set. Plot of GC3
against GC12 is showing a comparatively weaker, but
significant correlation (r = 0.2749, p < 0.1) in case of
virulence gene set. The above findings indicate that
the forces that are shaping the compositional patterns
of the Whole genome and VGS are the same for all
codon positions and acting on the three codon
positions in a similar way.
2.1 GC content distribution among the whole
genome and Virulence gene set
From the figure 1 and figure 2, it can be assumed that
GC content is more or less uniformly distributed
among the CDS of whole genome and VGS. So it can
be predicted that whole genome and VGS may have
the similar kind of nucleotide composition as well as
may have share the same pattern of codon usage.
2.2 Extent of codon usage bias
A%
T%
C%
G%
GC%
A3s
0.8041
0.0707
-0.4902 -0.5241 -0.6393
T3s
0.4029
0.6070
-0.5921 -0.6762 -0.3935
C3s
-0.2190 -0.6002
0.6797
0.1969
0.8144
G3s
-0.6766 -0.1759
0.3012* 0.6587
-0.5269
GC3s -0.4680 -0.2950 * 0.6952
0.4453
0.5445
ENC
-0.0981 0.0995
0.0351
-0.0340 0.0092