Heart Disease Prediction Using Machine Learning

Heart Disease Prediction Using Machine Learning

Cardiovascular diseases represent a significant worldwide health issue, causing a substantial number of deaths. Prompt identification and preventive measures are vital for mitigating their effects. Recently, a class of innovative computational methods known as machine learning has proven highly effective in prognosticating and diagnosing diverse medical ailments, and heart disease is one of them. By harnessing extensive datasets and cutting-edge algorithms, these models can accurately pinpoint individuals susceptible to heart disease and facilitate timely interventions. This article explores the domain of heart disease prediction through machine learning, shedding light on its promise, hurdles, and implications within the healthcare sector.

Various ailments, including coronary artery disease, heart failure, and irregular heart rhythms, fall under the umbrella term "heart disease." Detecting individuals at an early stage who are susceptible to these conditions can substantially enhance the well-being of patients through prompt interventions and adjustments to their way of life.

Now we try to predict whether a patient has heart disease or not under a given clinical parameter.

Data is abstracted from https://archive.ics.uci.edu/ml/datasets/heart+Disease

Features

Below are the details and descriptions of the data features.

  1. age - age in years
  2. sex - (1 = male; 0 = female)
  3. cp - chest pain type
    • 0: Typical angina: chest pain related decrease blood supply to the heart
    • 1: Atypical angina: chest pain not related to heart
    • 2: Non-anginal pain: typically esophageal spasms (non-heart related)
    • 3: Asymptomatic: chest pain not showing signs of disease
  4. trestbps - resting blood pressure (in mm Hg on admission to the hospital); anything above 130-140 typically causes concern
  5. chol - serum cholesterol in mg/dl
    • serum = LDL + HDL + .2 * triglycerides
    • above 200 is cause for concern
  6. fbs - (fasting blood sugar > 120 mg/dl) (1 = true; 0 = false)
    • '>126' mg/dL signals diabetes
  7. restecg - resting electrocardiographic results
    • 0: Nothing to note
    • 1: ST-T Wave abnormality
      • can range from mild symptoms to severe problems
      • signals non-normal heartbeat
    • 2: Possible or definite left ventricular hypertrophy
      • Enlarged heart's main pumping chamber
  8. thalach - maximum heart rate achieved
  9. exang - exercise-induced angina (1 = yes; 0 = no)
  10. oldpeak - ST depression induced by exercise relative to rest looks at the stress of the heart during unhealthy exercise heart will stress more
  11. slope - the slope of the peak exercise ST segment
    • 0: Upsloping: better heart rate with exercise (uncommon)
    • 1: Flatsloping: minimal change (typical healthy heart)
    • 2: Downslopins: signs of an unhealthy heart
  12. ca - number of major vessels (0-3) colored by fluoroscopy
    • colored vessel means the doctor can see the blood passing through
    • the more blood movement, the better (no clots)
  13. thal - thalium stress result
    • 1,3: Normal
    • 6: fixed defect: used to be a defect but ok now
    • 7: reversible defect: no proper blood movement when exercising
  14. target - have disease or not (1=yes, 0=no) (= the predicted attribute)

Code:

Importing Libraries

Loading Dataset

Output:

Heart Disease Prediction Using Machine Learning

EDA(Exploratory Data Analysis)

EDA plays a vital role in comprehending the dataset and extracting valuable insights. EDA encompasses a range of techniques aimed at thoroughly examining and visually representing the data, with the objective of unveiling underlying patterns, relationships, and possible anomalies that may exist within the dataset.

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

In this bar chart, we can see that more data samples show heart disease.

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Heart Disease Frequency according to sex

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

From the bar chart above, the frequency of females getting heart disease is higher in this dataset compared to males.

Age vs. Max Heart Rate for Heart Disease

Output:

Heart Disease Prediction Using Machine Learning

The chances of getting a maximum heart rate are higher for heart disease patients.

Output:

Heart Disease Prediction Using Machine Learning

In this histogram, we can see that approximately half of the sample's age is between 55 to 65 years old. The rest are from the 40s to 70s. There are also a few samples for 30-40 and 70 above.

Heart Disease Frequency per Chest Pain Type

cp - chest pain type

  • 0: Typical angina: chest pain related decrease blood supply to the heart
  • 1: Atypical angina: chest pain not related to heart
  • 2: Non-anginal pain: typically esophageal spasms (non-heart related)
  • 3: Asymptomatic: chest pain not showing signs of disease

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Most heart disease patients suffer from the third chest pain type, which is non-anginal pain, and some of those suffering from the first chest pain type, typical angina, and atypical angina. Although the second and third chest pain type is non-related to the heart, the data shows patients will suffer from those chest pain types. To make a conclusion, we might need to approach some healthcare professionals to ask for their opinions.

Correlation

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

1. Positive correlation, both variables increase or decrease in the same direction

2. Negative correlation, one variable increase and one variable decrease vice versa

Result 1: Chest pain and target have a positive correlation -> Higher chest pain level, and more targets may get heart disease.

Modeling

Train Base Models

Here we will employ the following machine-learning models:

  • Logistic Regression
  • K-Nearest Neighbors Classifiers
  • Random Forest Classifiers

Output:

Heart Disease Prediction Using Machine Learning

Base Model Comparison

Output:

Heart Disease Prediction Using Machine Learning

For the base model, Logistic Regression and Random Forest work way better than KNN.

Hyperparameter Tuning

We will employ the following ways of Hyperparameter Tuning:

  • by hand
  • RandomizedSearchCV
  • GridSearchCV

Tune By Hand

Output:

Heart Disease Prediction Using Machine Learning

Output:

Maximum KNN score on the test data: 75.41%

Heart Disease Prediction Using Machine Learning

After tuning the parameter for k value, the KNN classifier has improved, but the performance is still lower than Logistic Regression and Random Forest.

Hyperparameter tuning with RandomizedSearchCV

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

By using RandomizedSearchCV, the performance of random forest models has improved. But the logistic regression model's performance is still higher.

Hyperparameter Tuning with GridSearchCV

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Logistic Regression has the same scores for all of the different hyperparameter tuning models.

Out of three different classifiers, Logistic Regression has the best performance score during the training stage.

Evaluating tuned machine learning classifier beyond accuracy

We will be using the following metrics for evaluation:

  • ROC curve and AUC score
  • Confusion matrix
  • Classification report
  • Precision
  • Recall
  • F1-score

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Classification Report

Output:

Heart Disease Prediction Using Machine Learning

Calculate evaluation metrics using cross-validation using cross_val_score()

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Feature Importance

Feature importance is another as asking, "which features contributed most to the outcome of the model and how did they contribute?"

Finding feature importance is different for each machine learning model.

We may refer to feature importance for future collecting data.

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Output:

Heart Disease Prediction Using Machine Learning

Based on the visualization,

  • chest pain type(cp)
  • resting electrocardiographic results(restecg)
  • the slope of the peak exercise ST segment(slope)

have strong feature importance.

On the other hand, sex has the least importance.

Incorporating machine learning techniques for heart disease prediction presents a set of hurdles to overcome. These challenges encompass the necessity for extensive, varied, and carefully curated datasets, the possibility of biases within data acquisition, and the interpretability of the models themselves. Resolving these obstacles necessitates a cooperative effort among healthcare experts, data scientists, and regulatory entities to guarantee the ethical and efficient application of machine learning algorithms.

Conclusion

Heart disease prognosis utilizing machine learning stands as a notable stride forward in healthcare. Capitalizing on sophisticated algorithms and extensive data analysis, we are able to embrace forward-thinking and tailored methodologies in combating heart disease. As research and technological progress march on, it becomes imperative for stakeholders to unite, guaranteeing judicious execution, confronting obstacles, and maximizing the advantages of this revolutionary technology. By means of early identification and preventive measures, we endeavor to shape a future where global cardiovascular well-being experiences marked enhancement.