Computational Molecular Biology 2015, Vol. 5, No. 5, 1-9
2
there are more number of serum markers that helps to
clinically diagnostics through breast, colon and ovarian
cancer. However, its effectiveness of detecting more
genetic markers that widely considered to use more
advanced techniques such as DNA microarray
technology to identify genetic profiling of cancer by
allow ing thous ands of genes that s ignific an tly
assoc iated w ith cancer types (Golub TR 2001),
(Elvidge G 2006).
Herein we present computational methods to predict
genetic markers of three types of cancer based on
microarray data. Using different biological algorithms
to preprocess the data that significantly predicts the
quality of intensity values that shows screening of
datasets. Using statistical techniques to predict
differential gene expression of both upregulated and
down regulated. We have compared the Meta analysis
to predict markers of multi-cancer types are based on
high level computing power. Using gene-gene
network study shows various significant genes that
specifically regulated either with multi cancer types
specifically. We believed to these novel genes that
shows gene profiling expression will provide high
valuable markers those new approaches in diagnostics
and therapeutics.
Materials and Methods
The Meta analysis of three different cancer diseases
such as breast, colon and ovarian cancer datasets were
retrieved from GEO database. The Datasets of breast
cancer (GEO ID: GSE30543) of 6 samples with
SUM149 control siRNAs and siRNA targeting TIG1
replicates (Wang X1 et al., 2013). The colon cancer
dataset (GSE34299) of 4 samples has HT29 parental
cell lines and HT29RC PLX4720 resistant cell lines
grown in increasing concentration of the drug to
develop acquired resistance (Mao M1 et al., 2012).
The ovarian cancer dataset (GSE35972) of 6 samples
has untreated TOV112D cells and NSC319726 treated
with different biological replicates (Yu X1 et al.,2012).
All the datasets with different samples is analyzed
using GPL570 (HG-U133_Plus_2) Affymetrix human
genome array platform. All probe sets of HG-U95Av2
is identically replicated on the diseased transcript
variant. The RNA probe sets were derived from
RefSeq, dbEST and GenBank. The sequence clusters
were created from the UniGene database and gene
names were refined by publicly available databases.
Using Statistical analysis software such as R and
BioConductor to analyze pre-processing and differential
gene expression to classify breast, ovarian and colon
cancer genes that used as a potential drug targets.
Pre-processing of Raw Microarray data
Using Affy and affycoretools of BioConductor packages
is used to pre-process the data (R Core Team 2012),
(Robinson MD et al., 2010). There are different
algorithms is used such as RMA and MAS5 algor ithms
that helps to pre-process the data to correct both
foreground and background intensities of all probes.
Different statistical techniques used in normalization
of probe sets such as constant, quintiles and invariant
set that predicts the PM and MM corrections.
However, the signal intens ity for MM probe can often
be larger than PM probe implying that MM probe is
detecting true signal as well as background signal.
After correction of all intensity levels is used for
differential gene expression in each disease datasets.
Differential gene Expression analysis
After pre-processing of datasets, the resultant CEL
files used for differential gene expression. Using Limma
packages to predict differential gene expression data
arising from microarray RNAsamples. For datasets of
control and cancer samples of differential levels change
between two samples which genes are up-regulated
(increased in expression) or down-regulated (decreased
in expression). The clustering of genes that follows
expression patterns across a set of samples, or
clustering samples with similar expression patterns
across genes. Each sample group will contain
numerous replicates. The group expression level for a
probe will be summarized as the mean of the
expression levels in the group replicates. Thus,
differential expression problems are a comparison of
means. When there are two sample groups this is a t
test of some kind (Prashantha N et al., 2013). The
clustering of up regulated and down regulated genes
were predicted with clustering experiment. Hierarchical
clustering of differentially expressed genes with respect
to probable expression of B values with correlation
coefficient of control-cancer datasets. The relationship
among objects are represented by a tree whose branch