A HYBRID K-MEANS AND MACHINE LEARNING APPROACH FOR HEART DISEASE PREDICTION: A COMPARATIVE STUDY ON BINARY AND TERTIARY RISK CLASSIFICATION
Main Article Content
Abstract
Heart disease is one of the most dangerous and deadly diseases in the world, due to the large number of deaths caused by this disease annually. This paper proposes a hybrid approach that combines K-means with nine machine learning techniques for predicting heart disease and forecasting future heart disease occurrences. Two datasets, 1,026 records each, were generated: a binary infected-uninfected dataset and a three-class (infected/uninfected/likely infected) dataset, both labeled according to the WHO clinical criteria. The feature space was transformed, the data were clustered using the K-means algorithm, and classification was performed. The trained machine learning algorithms, utilizing the clustered data, included a Decision Tree, a random forest, XGBoost, and Gradient Boosting, with the aid of clinical parameters such as age, blood pressure, and cholesterol level. The Decision Tree achieved the highest accuracy of 89.27% on the three-class prediction problem, outperforming Gradient Boosting and Random Forest. The hybrid K-means and machine learning algorithm exhibits good predictive performance and potential clinical value for identifying patients at risk of future heart disease, facilitating early intervention.