Coders Packet

pima_indian _diabetes prediction using Python (machine learning)

By Debolina Poddar

In this Python project, we applied machine learning classification algorithms to predict whether or not the patients in the dataset have diabetes or not.

The main aim of this project is to predict whether or not the patients in the dataset have diabetes or not in Python using machine learning.

The data set is collected from Kaggle.

about dataset:

The dataset consists of several medical predictor variables and one target variable. columns are following:

1.pregnancies

2.glucose

3.Blood pressure

4.SkinThickness

5.Insulin

6.Bmi

7.DiabetesPredigreeFunction

8.Age

9.outcome

STEP 1:

Import the necessary packages  

STEP 2

import the dataset from the local folder

STEP 3

Exploratory data analysis: It is all about getting an overall understanding of data. It is done to find its properties, visualization, and help us to assure that our data is correct and ready to use for the machine learning algorithms.

STEP 4:

Splitting the dataset for training and testing the model.

STEP 5:

Model building:

SVM

NAIVE BAYES

DECISION TREE

KNN

RANDOM FOREST CLASSIFIER

Finally, we have trained our model on the basis of the following metrics.

accuracy (TP+TN)/ALL

recall TP/(TP+FN)

Precision TP/(TP+FP)

Also, I have included Area Under Curve(AUC) as AUC is a good way of comparing which is a better model.

In this dataset, there is a higher focus on the accuracy of predicting true positives hence true negatives are not really a priority. As such, greater focus will be placed on Accuracy and Recall. 

we can see that the test accuracy of the various models are generally within the same range, from approximately 73% to 81%

Based on Accuracy and Recall score, overall the KNN produced the best results, and it has a good AUC score as well.

Download Complete Code

Comments

No comments yet