A LIPSCHITZ-STABILITY-BASED HYBRID MODEL FOR CANCER SURVIVAL PREDICTION USING ENSEMBLE LEARNING ON THE PLCO DATASET.

Main Article Content

Shaesta Mujawar , Anita Chaware

Abstract

Early and reliable cancer detection remains central to improving survival outcomes, yet the clinical utility of machine-learning (ML) systems depends on both predictive accuracy and model robustness. This study introduces a stability-driven ensemble learning framework applied to the Prostate, Lung, Colorectal, and Ovarian (PLCO) cancer-screening dataset to achieve interpretable and reproducible prediction performance. The workflow integrates comprehensive preprocessing—missing-value imputation, feature scaling, and categorical encoding—with multi-criteria feature selection using SelectKBest (ANOVA), Recursive Feature Elimination (RFE), L1-regularized Lasso, and Mutual Information. To enhance generalization, multiple base classifiers—Logistic Regression, Random Forest, Support Vector Machine, Decision Tree, K-Nearest Neighbors, Artificial Neural Network, XGBoost, LightGBM, and CatBoost—were trained and combined through a stacked ensemble architecture using Logistic Regression as a meta-learner. Model performance was assessed through standard metrics (accuracy, precision, recall, and F1-score) and a novel robustness indicator based on empirical Lipschitz constants, which quantify the sensitivity of predictions and local explanations (SHAP values) to bounded input perturbations. Experimental results revealed that the stacked ensemble consistently achieved the highest predictive scores (accuracy ≈ 95.5%, F1 ≈ 0.865) with significantly lower median Lipschitz constants, indicating smoother decision boundaries and enhanced robustness. Among feature selectors, RFE with Logistic Regression offered the best balance of predictive power and stability, while compact feature sets generated by Lasso demonstrated improved interpretability. The proposed empirical Lipschitz framework provides a principled measure of stability beyond conventional resampling approaches, bridging the gap between accuracy and trustworthiness in clinical AI. Overall, this work establishes a mathematically grounded foundation for developing stable, explainable, and high-performing cancer-risk models suitable for integration into clinical decision-support systems.

Article Details

Section
Articles