HEART DISEASE PREDICTION USING MACHINE LEARNING ON A REAL-TIME CLINICAL DATASET WITH MULTI-DIAGNOSTIC FEATURES
Main Article Content
Abstract
Cardiovascular disease remains a significant health concern, being the primary cause of death, which explains why its early and precise prediction is necessary. Machine learning (ML) can be used to discover concealed clinical data patterns and aid in early diagnoses. This paper presents a baseline ML model constructed using an actual dataset of 1100 anonymized patient records obtained from Ayush Multi Speciality Hospital and Research Centre(AMSHRC) Pvt, Ltd, Vijayapur,Karnatak, India. A coordinated pre-processing process was employed, which included missing value treatment, outlier correction, categorical encoding, and scale normalization of numerical values. Exploratory analysis, correlation analysis, and model-based feature importance were used to examine clinically relevant predictors. The six supervised ML algorithms used were Logistic Regression, Support Vector Machine , Decision Tree, K-Nearest Neighbors, Random Forest, and XGBoost, which were trained using an 80:20 stratified split and assessed using Accuracy, Precision, Recall, F1-score, ROC-AUC, MCC, and confusion matrix. Stratified 10-fold cross-validation was employed to provide a good estimation of performance. All the models were extremely predictive, Random Forest and XGBoost (0.93%) and ROC-AUC (0.98) achieved the best accuracy. These aspects, including CAG, ECG,ECHO, TMT, and the nature of chest pain, turned out to be the strongest predictors. This research provides a solid foundation for the estimation of heart disease using clinically rich data through the implementation of ML. In further research, the size of the dataset will be expanded, features will be extracted either automatically or manually, hyperparameter optimization, nested cross-validation, and explainability algorithms like SHAP will be employed in order to build a more robust and reliable model.