Coders Packet

Diabetes Classification Using Machine Learning in Python

By Shravya Chinta

In this packet, we design a Machine Learning model that predicts whether a person is diabetic or not in Python.


In this project, we predict whether a person is diabetic or not using the given Diabetes Data set which consists of the following features - ['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin', 'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome'].


This project is implemented by the K-Nearest Neighbours(KNN) algorithm. I have used the Python language and Jupyter Notebook Editor. Dataset files are of .csv extension. 

Performed exploratory data analysis on health to gain domain knowledge and built a binary classifier. We plot a bar graph showing the number of classes and the number of examples in each class. We classify a person as a 0 or 1 (whether Diabetic or not) using KNN.

Python Modules:

NumPy, Pandas, Sklearn and Matplotlib

NumPy: NumPy stands for Numeric Python. NumPy provides an alternative to the regular Python lists. NumPy is a multidimensional array library of Python with which we can analyze the data much more efficiently than lists. 

Pandas: Pandas, which is a library of Python, provides data manipulation tools for data analysis. Using Pandas, we can read a CSV file which is a data set for the model. 

Sklearn: Sklearn is a Python library that features various algorithms for Machine Learning models such as Classification, Regression, Clustering, etc. 

Matplotlib: Matplotlib is a Python library that is used for data visualization. It is also used for plotting charts like bar plots, scatter plots, histograms, etc. We can use this library by using the ‘import matplotlib.pyplot’ command.

Download Complete Code


No comments yet