Creating machine learning pipeline in Python

0148426001599049955_intern4.2.txt

Pipeline is a set of multiple functions that is created to avoid repeating the same preprocessing functions on the new data whenever it is added to the original data set by automating them.

As we all know, all machine learning algorithms require a lot of data for their better working. Now, whenever new data is added in the previous data set, it is needed to be processed to avoid errors while running algorithm.

Various functions like standard scaler, simple imputer need to be called repeatedly for the new data to check whether th features are scaled in same range or they have any missing value respectively. This work becomes tedious and time consuming. To avoid this probelm of using these functions again and again, concept of pipeline is used in machine learning.

A machine learning pipeline is a set of all these functions to automate various processes on the new dataset whenever it is added to the original data set. This avoids using the data preprocessing functions and methods again and again. Creating a pipeline also helps in combining all the separated components under one set thus helps in using the algorithm more easily. A pipeline can be imported from sklearn.pipeline library. This pipeline can then be filled with functions like standard scaler(to check whether the features of data set are scaled in same range) or simple imputer(to check whether the data set containing any empty values for the features). The new data can be fitted with this machine learning pipeline. A flow diagram of pipeline is represented below:

DATA --> FEATURE SCALING --> SIMPLE IMPUTER --> MODEL EVALUATION

Thus, whenever new data is sent through these methods or components, it is sequentialy preprocessed and manipulated by them.

Coders Packet

Creating machine learning pipeline in Python

Comments