GAB-2015v6n6 - page 5

Genomics and Applied Biology 2015, Vol. 6, No. 6, 1-7

http://gab.biopublisher.ca

Khobondo et al. (2015) confirmed the existence of

codon usage bias in the porcine genome which might

suggest there is weak selection of preferred codons for

translation accuracy. The codon usage bias was

influenced subtle by nucleotide composition factors

(GC, GC3, CDS length) among others. In the study,

there was a negative correlation between genomic

codon adaptation index (gCAI), a proxy of codon

usage bias and GC content or GC3s. However, this

finding contradicted other findings (Hershberg and

Petrov, 2010) and was attributed to the difference in

the genome isochore structure, ambiguity (vary with

space and time) of the gene expression in mammals,

or due to difference in methodology of calculating

codon adaptation index (CAI) variants. The negative

correlation was reported between gCAI of pig and

gene length and was consistent with other reports in

organism such as yeast,

Caenorhabditis elegans,

Drosophila melanogaster, Arabidopsis thaliana, Populus

tremula

and

Silene latifolia

(Qiu et al., 2011). This

correlation shows that metabolic systems prefer to

express those genes that are less costly (Hahn and

Kern, 2005). Despite this evidence on CUB, it is not

known how this phenomenon (codon usage bias) may

affect gene functionality and paucity. Therefore, this

study was done to relate the pig CUB (5% of each

genes showing highest and lowest biasness) to gene

ontologies and functional genomics.

2 Materials and Methods

2.1 Sequence data

The genome sequence used for analysis was

downloaded from Ensemble v68 (

Sus scrofa

build

10.2) using BioMart (Ensembl v68). A total of

23,269 coding sequences was extracted from the

female Duroc pig breed as the reference genome,.

Only 21,550 CDS with more than 50 amino acids

(150 bp) were included for analysis. Gene ontology

(GO) terms were downloaded from Ensembl genome

browser as well.

2.2 Codon indices: Genomic Codon Adaptation

Index (gCAI)

Genomic codon adaptation index (gCAI) used in this

study was computed earlier (Khobondo et al., 2015) as

the geometric mean relative synonymous codon usage

(RSCU) divided by the highest possible geometric

mean of RSCU given the same amino acid (AA)

sequence using an in house perl script.

Therefore, the value gCAI is a proxy for codon bias

because values are normalized using codon frequencies

at equilibrium, thus there is no assumption of

expression bias (Khobondo et al., 2015).

2.3 Analysis tools

An in house Perl script was used to derive codon

indices as described by Khobondo et al. (2015). Five

percent (5%) of most and least bias genes according to

gCAI were extracted and grouped in two categories

(low and high bias). Because not all pig genes have

associated gene names, the genes without gene names

were blasted against the human Refseq mRNAs and

human reference protein sequences (blastn and blastp

respectively) and the best human hit was assigned as

gene name. Human orthologs of porcine genes were

used to perform gene ontology (GO) analysis. BinGO

v2.44 (Maere et al., 2005) a plugin of Cytoscape

v2.8.3 (Shannon et al., 2003) was used to identify

enriched GO terms using human gene annotation as

background. Hypergeometric test was used to assess

the significance of the enriched terms and Benjamini

and Hochberg correction was implemented for

multiple comparisons. Validation of over-represented

GO terms from BinGO was done using a Perl script

that compared the GO terms between the two files

(selected highest or lowest biased) and all GO terms

downloaded from Ensembl genome browser. Statistical

significance was computed using a chi- square test. In

order to correct for false enrichment, P-value

threshold of 0.0001 was used as significant value for

GO analysis.

3 Results

3.1 High codon usage bias and Gene Ontology terms

Gene ontology analysis on the 5% high and low CUB

genes using BinGO and validated by in-house Perl

script found 28 and 71 GO terms to be significantly

enriched in highly and lowly CUB genes, respectively.

The significant GO terms covered all the three gene

ontology domains of cellular components, biological

processes and molecular functions. Notable associated

GO terms like cell surface, plasma membrane, nucleolus,

nucleoplasm and nucleus showing anatomical structures

are cellular components related to biological processes.

The over-representation of ribosome, actin binding for

translation and holding cellular matrix (mentioned

above), were expected in highly biased genes. The

same apply for heme binding for oxygen supply in all

SEO Version

Warning.

You are currently viewing the SEO version of !text.
It has a number of design and functionality limitations.

We recommend viewing the Flash version or the basic HTML version of this publication.

1,2,3,4 6,7,8,9,10,11,12