Gewählte Publikation:
Baumgartner, C.
Establishing a survival analysis pipeline in R to streamline the analysis of large gene expression datasets.
[ Diplomarbeit/Master Thesis (UNI) ] Universität Graz; 2025.
FullText
- Autor*innen der Med Uni Graz:
- Betreuer*innen:
-
Feichtinger Julia
- Altmetrics:
- Abstract:
- This research establishes a survival analysis pipeline in R to streamline the analysis of large-scale datasets. The pipeline aims to simplify these analyses for a variety of research questions, utilizing large datasets, for instance those from The Cancer Genome Atlas Program (TCGA). Survival analysis is a crucial statistical tool in cancer research, analyzing the time from a defined start time, such as time of diagnosis or treatment start, to a defined event, such as relapse or death. By incorporating different factors, such as gene expression and treatment types, survival analysis contributes to the understanding of disease progression, patient survival and identification of potential prognostic biomarkers, thereby aiding the development of treatment strategies. The here established pipeline was initially created for the analysis of chemokine receptor expression in lymphoma patients as part of a research project by Uhl et al. The results showed that high CCR7 expression associates with poor survival prognosis in lymphoma patients and that the expression pattern of Richter Syndrome samples is more similar to non-neoplastic germinal B-cells than to other diffuse large B-Cell lymphomas (DLBCLs). Subsequently, the pipeline was extended and enhanced with a modular structure to be applicable to a wider range of datasets and research needs. To further test the established pipeline the prognostic potential of selected cytokines and related signalling molecules with cis non-coding natural antisense transcript (ncNAT) partner genes was assessed. The analysis found significant survival differences related to the expression of CD27 and CD27-AS1, as the only sense- and anti-sense pair, as well as of IL6 and IL6R. Interestingly, all the significant results for CD27 associate high CD27 expression with better survival, while all the significant results for CD27-AS1 associate high expression with lower survival rates, though in different cancer types. The established pipeline demonstrated capabilities in quickly processing large datasets for different specifications and research demands. Its design supports the addition of further statistical methodologies and the customization of parameters to facilitate its application for individual research questions within a widely accessible software environment. This project underlines the importance of structured, reproducible and efficient data processing methods to advance research across diverse scientific fields.