In this project I made a XGBoost machine learning model to predict the annual medical charges of a person.
In this project, I have used the XGBoost model to predict the annual medical charges for an individual using some of the parameters like his/her age, sex, BMI, etc. Below is the walkthrough for the project:-
1) First read the data from the given CSV file into a Pandas data frame.
2) Now perform analysis and visualization by plotting some graphs or plots and establish the relationship between the different variables to gain some insight into the data. For this Python libraries such as matplotlib, seaborn, and plotly are used.
3) Now identify the input and target variables and separate out the two. Further, separate the input data into numerical and categorical features.
4) Next we need to scale the numerical features to a value between 0 and 1 and encode the categorical features to numbers so that the model can be trained efficiently. Scaling is done by the MinMaxScaler function and encoding is done using the OneHotEncoder function of the scikit-learn library.
5) Next we create separate training and testing sets from our processed data using the train_test_split function from the scikit-learn library.
6) Now it's time to train the model. For this, we install the xgboost module and import the XGBRegressor class from it. Then we fit the model to the training data and predict the results on the test data.
7) Finally we measure the performance of our model using the RMSE(Root Mean Squared Error) between the targets and the predictions from our model. You can always tweak some of the hyperparameters to get slightly better results but the performance of our base XGBoost model would already be much better than other ML algorithms.
This model can be also be used for big datasets and can be easily implemented with the steps given above. So go ahead and train your own ML model using XGBoost.