The project is on win prediction of the league of legends diamond ranked games in Python using machine learning algorithms. The logistic regression model is giving the highest accuracy 73.026 %.
In this project, I have used different machine learning algorithms to get a better accuracy-
Gaussian Naive Bayes-
It is a variant of Naive Bayes(based on Bayes Theorem) that follows Gaussian normal distribution. It supports continuous data. when we work with continuous data, we always assume that the continuous values of each class are distributed according to Gaussian(or normal). The codes are given below-
from sklearn.naive_bayes import GaussianNB from sklearn.metrics import accuracy_score # fit the model clf_nb = GaussianNB() clf_nb.fit(X_train, y_train) pred_nb = clf_nb.predict(X_test) # get the accuracy score accuracy_nb = accuracy_score(pred_nb, y_test) print(accuracy_nb*100)
There is an approach to creating a simple model is to assume that the data is described by a Gaussian distribution with no co-variance that is data is independent between dimensions. If we can find the mean and standard derivation of each point we can easily fit the model.
Decision Tree Classifier-
It is a supervised machine learning algorithm that has a tree-like structure where the internal node represents features and the branch represents a decision rule. It learns how to do partition depending on the feature or attribute value.
# fit the decision tree model from sklearn import tree from sklearn.model_selection import GridSearchCV tree = tree.DecisionTreeClassifier() # search the best params grid = {'min_samples_split': [5, 10, 20, 50, 100]}, clfi_tree = GridSearchCV(tree, grid, cv=5) clfi_tree.fit(X_train, y_train) predictor_tree = clfi_tree.predict(X_test) # get the accuracy score accuracy_tree = accuracy_score(predictor_tree, y_test) print(accuracy_tree*100)
So, first, we need to do feature selection for that we need to divide given columns into two parts: target variable and independent variable. then to understand the model performance we further divide the dataset into training data and test data. using train_test_split() function.after the splitting, we will create a building decision tree model using Scikit-learn.
Let's estimate how accurately the model has predicted the outcome.
Random Forest Classifier-
It is a supervised machine learning algorithm and easy to use. It builds multiple decision trees and merges them all to get a better accurate outcome.
# fit the model from sklearn.ensemble import RandomForestClassifier rf = RandomForestClassifier() # search the best params grid = {'n_estimators':[100,200,300,400,500], 'max_depth': [2, 5, 10]} clfi_rf = GridSearchCV(rf, grid, cv=5) clfi_rf.fit(X_train, y_train) predictor_rf = clfi_rf.predict(X_test) # get the accuracy score accuracy_rf = accuracy_score(predictor_rf, y_test) print(accuracy_rf*100)
pseudocode-
1.randomly select the "k" feature from the total "m" features of the dataset.
2.calculate the node "d" using split point
3.split the node into daughter nodes.
4.Repeat the above steps until the desired number of nodes has been reached.
Logistic Regression-
It is a supervised machine learning classification algorithm that is used to predict the probability of categorical variables. the continuous or dependent variable coded as 0(no, failure) and 1(yes, success). We can say it predicts P(Y=1) as a function of X.
# fit logistic regression model from sklearn.linear_model import LogisticRegression lm = LogisticRegression() lm.fit(X_train, y_train) # get accuracy score predictor_lm = lm.predict(X_test) accuracy_lm = accuracy_score(predictor_lm, y_test)
print(accuracy_lm*100)
steps-
1.import packages,function, and classes
2.get the data and transform it. then create a classification model and fit this with the previous one.
3. evaluate the model to see how accurately it's working.
Submitted by Sudipta Ghosh (Sudipta)
Download packets of source code on Coders Packet
Comments