Cancer Genetics and Epigenetics 2016, Vol.4, No.2, 1-9
2
with estrogen exposures
. Abnormal growth factor signal of interaction between stromal
cells and epithelial cells can promote the growth of malignant cells
. And overexpression of lepton can promote cell proliferation and cancer in breast adipose tissue
.
The development, recurrence and metastasis of breast cancer is a multi-gene participation, multi-step process of
synergy, inactivation of oncogenes or activation of tumor suppressor genes are important in tumor genesis
. Thus, differences in gene expression profile changes has been the hot of pathogenesis,
recurrence and metastasis research of breast cancer
. Using microarray technology can
effectively filter out the breast cancer-specific differentially expressed genes
. Currently,
there are some studies have screened differentially expressed genes in breast cancer patients. Grb14 was found
highly expressed in 23.1% breast cancer, and this over-expression is associated with better disease-free and overall
survival time, and can be as independent prognostic factor
. Gasoline, a major actin-binding
protein is widely expressed in normal cells, and is down regulated in a variety of cell types comprising the
mammary epithelial
.
In recent years, many studies committed to the research of breast cancer prognosis, the discovery of new
prognostic markers of breast cancer will provide guidance for treatment. Staub et al have demonstrated in three
tumor types, including colorectal cancer, breast cancer and gliomas, patients with low expression modules which
were co-expression with WIPF1 generally have a good prognosis
. Proteases as biomarkers in
breast cancer prognosis and diagnosis has also been studied
. Song et al confirmed the
conversion acidic coiled-coil protein (TACC3) gene overexpression was significantly correlated with poor
prognosis in breast cancer, particularly in patients with tumor grade 2-4. And multivariate analysis showed that the
expression of TACC3 are independent prognostic factor in breast cancer patients
. Zhu et al
demonstrated that the histone methyltransferase enzyme hSETD1A was associated with triple negative breast
cancer prognosis. They proved hSETD1A positive was correlated with short-term survival in patients with
triple-negative breast cancer, suggesting that it can be used as prognostic biomarker of triple negative breast
cancer
.
This study aims to use high-throughput gene sequencing data, discovery new breast cancer prognostic markers by
screening differential expression gene and taking advantage of network. Firstly RNA-Seq data from The Cancer
Genome Atlas (TCGA) database were used to find breast cancer prognostic biomarkers. Significance analysis of
microarrays (SAM) was used to find the differentially expressed genes between tumors and normal breast samples.
Then we obtained the significant Go terms and KEGG pathways the genes enriched. Next KEGG pathway
network was constructed using the interaction between genes extracted from pathways and the topological
properties of network were analyzed. The hub nodes in this network were selected as candidate genes. Next, the
association between these candidate genes and patient survival was tested by Cox proportional hazards regression
analysis and obtained the genes which have significant impact on the survival. Then these genes were introduced
into the multivariate analysis, risk scores of samples were calculated and according to which samples were
divided into high and low risk groups, whether there are significant differences on survival between the two
groups were analyzed.
Materials and Method
Gene expression dataset of breast cancer
In recent years, the development of high-throughput sequencing technology makes it possible for researchers to
understand the mechanism of diseases including cancer at the whole genome level. RNA-SeqV2 level 3 gene
expression data of breast cancer were downloaded from TCGA database. It was processing according to the
following steps: (1) The 0 value entries were instead using the minimum value of the data; (2) gene expression
data were log2 standardization; (3) Because the samples contain multiple batches, sva R package was employed to
remove batch effects using Combat function. Finally 1029 samples and 20,531 genes were included in this dataset,
including 1099 breast tumor samples and 110 normal control samples.