DescriptionTraditional clustering approaches for gene expression data are not well adapted to address the complexity and heterogeneity of tumors, where small sets of genes may be aberrantly co-expressed in specific subsets of tumors. Biclustering algorithms that perform local clustering on subsets of genes and conditions help address this problem. We have proposed a graph-based Tunable Biclustering Algorithm (TuBA) (Chapter 2) based on a novel pairwise proximity measure that leverages the size of the data sets to identify subsets of tumor samples that co-express subsets of genes at their highest or lowest levels relative to other samples.
We applied TuBA to three large gene expression datasets encompassing a total of 3,940 breast invasive carcinoma (BRCA) patients (Chapter 3). We demonstrated that there was significant agreement between the results obtained for each data set, and discovered that about 50% of the altered co-expression signatures were associated with a subtype of the disease that exhibits low levels of expression of the estrogen hormone receptor 1 (ER) and the human epidermal growth factor receptor 2 (HER2) genes. Tumors belonging to this subtype are labelled as ER-/HER2-. Since only 15% of all BRCA patients are estimated to have tumors that belong to this subtype, our algorithm was able to highlight the tremendous heterogeneity in alterations within tumors of this subtype. Quite significantly, more than 50% of these signatures were associated with alterations in the DNA that results in amplification (or deletion) of genes’ copies, which subsequently result in higher (or lower) level of gene expression. Thus, TuBA was especially effective in identifying transcriptionally active copy number variations in tumor samples. Finally, TuBA identified biclusters that were associated with the tumor microenvironment, which included biclusters associated with infiltrating immune and stromal cells. These can improve our understanding about the role played by the microenvironment in modulating tumor progression.
We showed that TuBA outperforms other algorithms in identification of co-expressed genes located in transcriptionally active copy number altered sites (Chapter 4). Moreover, from a differential co-expression perspective, TuBA offers an advantage over other methods since no prior specification of subsets of samples (conditions) is necessary; the nature of our proximity measure ensures that such differential co-expression signatures are preferentially identified.
In summary, our method identified a multitude of altered transcriptional profiles associated with the tremendous heterogeneity of diseased states in breast cancer. Exploring the diversity of these aberrant signatures can help identify potential biomarkers of clinical relevance that can further improve treatment outcomes, especially for ER-/HER2- breast cancers. Although transcriptomic alterations are not the ultimate determinants of progression of disease, our algorithm holds the promise to improve therapeutic selection and design by identifying significantly altered transcriptional patterns associated with tumors.