DescriptionBreast cancer is the second leading cause of cancer related death amongst women and has a major impact on the lives of those affected by it. Previous research has uncovered that miRNAs, a type non-coding RNA, play a key role in the onset and development of breast tumors. Genomics, transcriptomics and proteomics studies have shed light on the intricate involvement of miRNA in mediating breast cancer progression. These small molecules are involved in many biological processes and we have yet to understand all the levels of complexity. Studying miRNA can lead to more efficient methods of screening and treatment for breast tumors while providing us with other key information from the molecular level. As confirmed by previous studies, miRNAs are good candidates for diagnostic as well as prognostic markers. This project takes a panoptic approach to studying miRNA in breast cancer genomics and proteomics and will strive to accomplish the following goals: Visualize miRNA expression data through heatmaps and identify sub-types. - We hypothesize that we will see a difference in expression of miRNAs across subtypes. Use heatmaps to examine miRNA expression across race. - We hypothesize that miRNA expression will vary across race. Perform PCA to deduce potential miRNA biomarkers for each subtype of breast cancer. - We hypothesize that we will find at least one unique biomarker for each subtype of breast cancer. Find the top 20 miRNA pairs with statistically significant correlation in expression. - We hypothesize that these pairs will have miRNAs from the same family. Study networks and pathways. - We hypothesize that we will find networks within subtypes of breast cancer. We also feel that miRNAs will be involved in more than one disease pathway. Data from The Cancer Genome Atlas (TCGA) data portal, which is an open source data portal open to the public, was utilized for the purposes of this project. The data is generated through miRNA-sequencing techniques with the aid of an Illumina Genome Analyzer. We used a combination of self-generated scripts in Perl and R and web based databases to filter, sort and analyze our data [1, 2]. On visual interpretation of our heatmaps we found different patterns of miRNA expression in the various sub-types of breast cancer. Patterns across race were not as significantly different upon visual interpretation but this may be attributed to data from a small cohort. Our PCA revealed potential biomarkers for each subtype of breast cancer. We suggest hsa-mir-127 and hsa-mir-379 as potential biomarkers of the basal subtype, hsa-mir-19a, hsa-mir126, hsa-mir-20a and hsa-mir-30a as potential biomarkers of the HER2 subtype, hsa-mir-222 as a potential biomarker of the Luminal A subtype and hsa-mir-152, hsa-mir-26b and hsa-mir-200c as potential biomarkers of the Luminal B subtype. By graphing these potential biomarkers across all subtypes we have visually confirmed our findings. We found 20 pairs of miRNAs with statistically significant correlation in expression and of those 6 were pairs that with miRNAs that are not related. We found miRNA-gene networks within two subtypes of breast cancer and generated a schematic to show the first layer of interaction. We used an online tool to generate another schematic that illuminates miRNA involvement in various pathways. We feel that our study has led to some interesting findings. We encourage future studies to further validate our proposed biomarkers as well as improve our methods. Our study lacked data from normal tissue samples; the inclusion of which we feel would have yielded more thorough results. In conclusion, we believe that computational methods of data analysis in studying miRNA expression data are truly powerful and the results of these methods will only get more accurate as more data is made available.