 
          Cancer Genetics and Epigenetics 2016, Vol.4, No.2, 1-9
        
        
        
          3
        
        
          Screening of differentially expressed genes
        
        
          In this study, significance analysis of microarrays (SAM) was used to find the differentially expressed genes using
        
        
          samr R package. SAM method is a kind of statistical methods for analysis of microarray gene expression data and
        
        
          screening differentially expressed genes. It commonly uses permutation algorithm to estimate the false discovery
        
        
          rate (FDR), so as to control the error rate for multiple testing purposes. The data in this study included 1099 breast
        
        
          tumor samples, 110 adjacent normal samples and 20,531 genes. The threshold for differentially expressed genes
        
        
          selection was q < 1, foldchange > 2 or foldchange < 0.5. Obtain up- and down-regulated genes in breast tumors
        
        
          respectively.
        
        
          Functional enrichment analysis for gene lists
        
        
          Functional annotation and pathway enrichment analysis of differentially expressed genes were conducted using
        
        
          DAVID (The Database for Annotation, Visualization and Integrated Discovery) bioinformatics tools 
        
        
        
        
        
          . DAVID provides a comprehensive set of functional annotation tools; it can be
        
        
          used for researchers to understand biological significance of a large set of genes. Here DAVID online tools were
        
        
          used for enrichment analysis of all the differential expression genes on the GO (Gene Ontology) function and
        
        
          KEGG (Kyoto Encyclopedia of Genes and Genomes) pathway, control significant level FDR < 0.05.
        
        
          Construction of KEGG pathway network
        
        
          Genes and Genomes Kyoto Encyclopedia (KEGG) is a database resource for understanding high-level functions
        
        
          and utilities of the biological system, such as the cell, the organism and the ecosystem, from molecular-level
        
        
          information, especially large-scale molecular datasets generated by genome sequencing and other high-throughput
        
        
          experimental technologies. We downloaded KEGG pathway XML files from the KEGG website, using the XML
        
        
          R package to extract the interactions between these genes, integrated these interactions information and built a
        
        
          KEGG pathway network. Cytoscape 
        
        
        
           software was used for analysis and visualization of this
        
        
          network. Cytoscape is an open source software platform that provides users to visualize molecular interaction
        
        
          networks and biological pathways.
        
        
          Prognostic analysis
        
        
          The correlation between gene expression and prognosis of breast cancer was tested by using the Cox proportional
        
        
          hazards regression model, adjusting for age, stage due to they have a certain degree of correlation with survival,
        
        
          they were used as covariates. Genes significantly affected survival were used in the subsequent multivariate
        
        
          analysis, the risk scores were calculated for each sample, namely the regression coefficients for each gene * the
        
        
          gene expression values in the sample and plus them. Survival analysis was conducted using the Kaplan Meier
        
        
          method, logrank method for testing the statistically significant. All statistical analyzes were carried out in the R
        
        
          open-source software, p < 0.05 was considered significant.
        
        
          Results
        
        
          The differential expression genes in breast cancer and their function analysis
        
        
          To find potential breast cancer-related genes, SAM algorithm was employed to find genes that were differentially
        
        
          expressed between 1099 breast tumor samples and adjacent 110 normal samples from TCGA database. Genes with
        
        
          q < 1, foldchange > 2 or foldchange < 0.5 were selected as the differential between cancer and normal. A total of
        
        
          5880 differentially expressed genes were obtained, including 1715 upregulated and 4165 downregulated genes in
        
        
          cancer samples. As can be seen, most of the differential expression genes in breast cancer were changed
        
        
          downward.
        
        
          DAVID online bioinformatics tool then was used for GO functional analysis of this 5880 differentially expressed
        
        
          genes, control FDR < 0.05. 227 significantly enriched Go terms were obtained, these genes mainly enriched in
        
        
          cell adhesion, biological adhesion, cell-cell signaling, behavior, regulation of system process, ion transport and
        
        
          many other biological processes. In addition, we also conducted KEGG pathway enrichment analysis for these
        
        
          genes, and obtained eight significant KEGG pathways, including neuroactive ligand-receptor interaction, cytokine