Indexing and Selecting Data with Pandas in Python

Here the task mentioned to me Indexing and selecting Data with Pandas in Python the key learnings to me as intern are :

Learning to index and select data in Pandas helps us work with large datasets easily. We can access rows and columns using .loc[] for labels and .iloc[] for positions.

Filtering data with conditions, setting custom indexes, and retrieving specific values with .at[] and .iat[] make data handling faster. Multi-level indexing organizes complex data, and random sampling helps in quick analysis. These skills make data manipulation and analysis simple and efficient in Python.

Pandas in Python for indexing and Selecting Data

 

Here the code for given task is as follows :

import pandas as pd

# Creating a sample DataFrame
data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [25, 30, 35, 40, 45],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Salary': [70000, 80000, 90000, 100000, 110000]
}

df = pd.DataFrame(data)

# Displaying the original DataFrame
print("Original DataFrame:\n", df)

# Selecting specific columns
print("\nSelecting 'Name' and 'Salary' columns:\n", df[['Name', 'Salary']])

# Selecting rows using loc (label-based)
print("\nSelecting row where Name is 'Charlie':\n", df.loc[df['Name'] == 'Charlie'])

# Selecting rows using iloc (position-based)
print("\nSelecting the second row:\n", df.iloc[1])

# Filtering data based on conditions
filtered_df = df[df['Age'] > 30]
print("\nFiltering rows where Age > 30:\n", filtered_df)

# Selecting a specific value using .at and .iat
print("\nSalary of the first person (using .at):", df.at[0, 'Salary'])
print("Age of the second person (using .iat):", df.iat[1, 1])

# Setting 'Name' as the index
df.set_index('Name', inplace=True)
print("\nDataFrame with 'Name' as index:\n", df)

# Resetting index back to default
df.reset_index(inplace=True)
print("\nDataFrame after resetting index:\n", df)

# Random sampling
print("\nRandomly selecting 2 rows:\n", df.sample(2))

output: 

Original DataFrame:
Name Age City Salary
0 Alice 25 New York 70000
1 Bob 30 Los Angeles 80000
2 Charlie 35 Chicago 90000
3 David 40 Houston 100000
4 Eve 45 Phoenix 110000

Selecting 'Name' and 'Salary' columns:
Name Salary
0 Alice 70000
1 Bob 80000
2 Charlie 90000
3 David 100000
4 Eve 110000

Selecting row where Name is 'Charlie':
Name Age City Salary
2 Charlie 35 Chicago 90000

Selecting the second row:
Name Bob
Age 30
City Los Angeles
Salary 80000
Name: 1, dtype: object

Filtering rows where Age > 30:
Name Age City Salary
2 Charlie 35 Chicago 90000
3 David 40 Houston 100000
4 Eve 45 Phoenix 110000

Salary of the first person (using .at): 70000
Age of the second person (using .iat): 30

DataFrame with 'Name' as index:
Age City Salary
Name 
Alice 25 New York 70000
Bob 30 Los Angeles 80000
Charlie 35 Chicago 90000
David 40 Houston 100000
Eve 45 Phoenix 110000

DataFrame after resetting index:
Name Age City Salary
0 Alice 25 New York 70000
1 Bob 30 Los Angeles 80000
2 Charlie 35 Chicago 90000
3 David 40 Houston 100000
4 Eve 45 Phoenix 110000

Randomly selecting 2 rows:
Name Age City Salary
3 David 40 Houston 100000
0 Alice 25 New York 70000

More Contribution you can refer are as mentioned :
Python Pandas Data Science Library

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top