Coders Packet

Diabetes Dataset Analysis in Python

By Abhishek Kumar Singh

  • Jupyter_Notebook.ipynb
  • pima-indians-diabetes.names
  • In this project, we would check whether a patient is diabetic or not through an analysis of the Diabetes Dataset in Python.

    This Project requires users to have a pre-requisite knowledge of basic Python Programming and also required users to be familiar with the Jupyter notebook.

    Having knowledge of Logistic Regression would be a plus point for you.

    This Project requires pre_installation of certain libraries in the working environment:

    1. pandas

    2. numpy

    3. scikit learn

    Logistic Regression :

    It is one of the popular machine learning algorithms. It is a statistical model that uses a logistic function to model or analyze a binary dependent/target variable. It describes the probability of occurrence of a certain class from the various other parameters i.e., the independent/input variables.

     Why Logistic Regression is used in this project?

    There are multiple machine learning algorithms so why only logistic regression? The reason behind this question is because here within the dataset, we are going to deal with multiple parameters that consist of both numerical as well as categorical data and since when we deal with questions that provide you solutions as a yes/no (like for example, Is the patient having diabetes or not?, Is he suffering from cancer or not?) then it becomes obvious that our target variable is definitely a categorical one and if the other parameters are numerical or both numerical and categorical then we need to choose an algorithm that best fits under such situations. Hence Logistic Regression finds me the best fit for this problem.

    You can also explore other machine learning algorithms and try if you can get better accuracy than this.

    Download Complete Code


    No comments yet