Cancer Genetics and Epigenetics 2016, Vol.4, No.2, 1-9
3
Screening of differentially expressed genes
In this study, significance analysis of microarrays (SAM) was used to find the differentially expressed genes using
samr R package. SAM method is a kind of statistical methods for analysis of microarray gene expression data and
screening differentially expressed genes. It commonly uses permutation algorithm to estimate the false discovery
rate (FDR), so as to control the error rate for multiple testing purposes. The data in this study included 1099 breast
tumor samples, 110 adjacent normal samples and 20,531 genes. The threshold for differentially expressed genes
selection was q < 1, foldchange > 2 or foldchange < 0.5. Obtain up- and down-regulated genes in breast tumors
respectively.
Functional enrichment analysis for gene lists
Functional annotation and pathway enrichment analysis of differentially expressed genes were conducted using
DAVID (The Database for Annotation, Visualization and Integrated Discovery) bioinformatics tools
. DAVID provides a comprehensive set of functional annotation tools; it can be
used for researchers to understand biological significance of a large set of genes. Here DAVID online tools were
used for enrichment analysis of all the differential expression genes on the GO (Gene Ontology) function and
KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway, control significant level FDR < 0.05.
Construction of KEGG pathway network
Genes and Genomes Kyoto Encyclopedia (KEGG) is a database resource for understanding high-level functions
and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level
information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput
experimental technologies. We downloaded KEGG pathway XML files from the KEGG website, using the XML
R package to extract the interactions between these genes, integrated these interactions information and built a
KEGG pathway network. Cytoscape
software was used for analysis and visualization of this
network. Cytoscape is an open source software platform that provides users to visualize molecular interaction
networks and biological pathways.
Prognostic analysis
The correlation between gene expression and prognosis of breast cancer was tested by using the Cox proportional
hazards regression model, adjusting for age, stage due to they have a certain degree of correlation with survival,
they were used as covariates. Genes significantly affected survival were used in the subsequent multivariate
analysis, the risk scores were calculated for each sample, namely the regression coefficients for each gene * the
gene expression values in the sample and plus them. Survival analysis was conducted using the Kaplan Meier
method, logrank method for testing the statistically significant. All statistical analyzes were carried out in the R
open-source software, p < 0.05 was considered significant.
Results
The differential expression genes in breast cancer and their function analysis
To find potential breast cancer-related genes, SAM algorithm was employed to find genes that were differentially
expressed between 1099 breast tumor samples and adjacent 110 normal samples from TCGA database. Genes with
q < 1, foldchange > 2 or foldchange < 0.5 were selected as the differential between cancer and normal. A total of
5880 differentially expressed genes were obtained, including 1715 upregulated and 4165 downregulated genes in
cancer samples. As can be seen, most of the differential expression genes in breast cancer were changed
downward.
DAVID online bioinformatics tool then was used for GO functional analysis of this 5880 differentially expressed
genes, control FDR < 0.05. 227 significantly enriched Go terms were obtained, these genes mainly enriched in
cell adhesion, biological adhesion, cell-cell signaling, behavior, regulation of system process, ion transport and
many other biological processes. In addition, we also conducted KEGG pathway enrichment analysis for these
genes, and obtained eight significant KEGG pathways, including neuroactive ligand-receptor interaction, cytokine