Hybrid Modelling Approaches Validation for Innovative Water Quality Prediction using N-Fold Oversampling
Main Article Content
Abstract
This research explores the validations of hybrid machine learning and deep learning methodologies for predicting water quality, utilizing a water_potability dataset characterized by class imbalance and comprising 3,276 samples with essential physicochemical attributes. The study introduces composite models that integrate traditional classifiers—Support Vector Machine (SVM), Decision Tree (DT), and Gradient Boosting (XGBoost), with Convolutional Neural Networks (CNN), aiming to harness their complementary capabilities for enhanced classification performance. To mitigate the effects of data imbalance, the Synthetic Minority Oversampling Technique (SMOTE) is applied, thereby improving model resilience and generalization. Comparative evaluation reveals that the SVM-CNN hybrid model achieves the highest predictive accuracy at 81.20%, surpassing the XGBoost-CNN and DT-CNN configurations, which attain 78.90% and 75.60%, respectively. These outcomes underscore the potential of hybrid ML-DL architectures in modeling complex nonlinear interactions and advancing the reliability of water potability assessments in environmental informatics.