when we have a large dataset with many columns, and we need only a subset of columns to work with (or) when we have some columns that contain irrelevant data and we want to remove them from the dataframe. In such scenarios there is the need for deleting certain columns from the dataset.
Using drop() method
The drop() method of pandas package is used for removing certain rows or columns from the pandas data frame. This method has two parameters label and axis. The label parameter is used to specify rows or columns that you want to drop. The axis parameter is to specify whether you want to remove row or column.
Syntax:
DataFrame.drop(labels=None, axis=0, index=None, columns=None, level=None, inplace=False, errors='raise')
Program:
import pandas as pd #importing pandas package data = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60], 'C': [70, 80, 90], 'D': [100, 110, 120]}) #creating a dataframe with 4 columns print(data) #printing the data print() data = data.drop('D', axis=1) #dropping a column of dataframe print(data) #printing the data after dropping a column
output:
A B C D 0 10 40 70 100 1 20 50 80 110 2 30 60 90 120 A B C 0 10 40 70 1 20 50 80 2 30 60 90
Using loc[] method
The loc method in pandas is used for label-based(name of columns) indexing and allows you to access a group of rows and columns by labels or a conditional statement .Its is similar to that if slicer operator in python.
Syntax:
DataFrame.loc[row_label, column_label]
Program:
import pandas as pd #importing pandas package data = pd.DataFrame({'A': [10, 20, 30], 'B': [40, 50, 60], 'C': [70, 80, 90], 'D': [100, 110, 120]}) #creating a dataframe with 4 columns print(data) #printing the data print() data = data.loc[:, ['A', 'B', 'C']] #storing only the specified columns print(data) #printing the data after removing a column
Output:
A B C D 0 10 40 70 100 1 20 50 80 110 2 30 60 90 120 A B C 0 10 40 70 1 20 50 80 2 30 60 90