Back to 2025 Abstracts
MACHINE LEARNING PREDICTION OF STAGE AND SURVIVAL USING CANCER STEMNESS-RELATED ISOFORM EXPRESSION IN GASTRIC ADENOCARCINOMA
Yifan Liang
*1, Kyle Lien
1, Zhiyong Mi
1, Paul Kuo
1,21University of South Florida Morsani College of Medicine, Tampa, FL; 2Bay Pines Veterans Affairs Healthcare System, Bay Pines, FL
Introduction: Cancer stemness describes the ability of cancer cells to differentiate and self-renew; evidence suggests that it is associated with recurrence, metastasis, and resistance to therapy. Alternative splicing has been shown to affect cancer stemness in many cancer types. However, there are still questions on whether differences in isoform expression are quantifiable enough to functionally predict prognosis. We aim to develop machine learning (ML) models using cancer-stemness related isoform mRNA expression data from gastric adenocarcinoma (STAD) patients to predict staging and survival, including time-to-death.
Methods: Transcriptional and clinical data for 392 STAD patients was downloaded from the TSVdb site, a web-tool to explore mRNA alternative-splicing based on TCGA samples. Data downloaded includes isoform expression, clinical sample type, and clinical staging. Expression of 380 isoforms across a 120 gene cancer stemness-related signature was used as features for ML analysis. Patients with stage I STAD were defined as low-stage and stage II, III, and IV STAD were defined as high-stage. Data was split into training and testing groups at a 6:4 split for stage classification and 8:2 for survival analysis. Support Vector Machine (SVM), ElasticNet, GradientBoosting, and Random Forest algorithms were implemented for stage classification. Survival SVM with linear and RBF kernels, Random Survival Forest, GradientBoosting, and ElasticNet algorithms were implemented for survival analysis. To account for unbalanced classes, a synthetic minority oversampling technique (SMOTE) was implemented on staging data. Hyperparameters were tuned with GridSearch and performance metrics were computed with 5-fold cross validation. The ML algorithms were applied using Python 3.9 and the scikit-learn 1.0.2, scikit-survival 0.21.0, and imbalanced-learn 0.12.3 libraries.
Results: The AUC for stage classification models ranged from 0.849 to 0.922 (Figure a). Random Forest was the highest performing model (AUC = 0.922, Accuracy = 0.968, Recall = 2.0, F1 = 0.984). The concordance index (C-index) for survival analysis models ranged from 0.645 to 0.720. Mean AUC ranged from 0.692 to 0.764. GradientBoosting Survival was the highest performing model (C-index = 0.705, Mean AUC = 0.764). For all survival models, time-dependent AUC dropped significantly for predictions after ~400 days (Figure b).
Conclusions: We show that cancer stemness-related isoform expression can be used to predict staging and survival in STAD. Stage classification and survival analysis with isoform expression are complex, non-linear problems. Broader hyperparameter tuning and feature selection may improve model performance. Further work is ongoing to validate these models with external datasets and develop pan-cancer ML models using cancer stemness-related isoform expression to predict prognosis.
a. ROC curves for stage classification models
b. Time-dependent AUC for survival analysis models
Back to 2025 Abstracts