Here the task mentioned to me Indexing and selecting Data with Pandas in Python the key learnings to me as intern are :
Learning to index and select data in Pandas helps us work with large datasets easily. We can access rows and columns using .loc[]
for labels and .iloc[]
for positions.
Filtering data with conditions, setting custom indexes, and retrieving specific values with .at[]
and .iat[]
make data handling faster. Multi-level indexing organizes complex data, and random sampling helps in quick analysis. These skills make data manipulation and analysis simple and efficient in Python.
Pandas in Python for indexing and Selecting Data
Here the code for given task is as follows :
import pandas as pd # Creating a sample DataFrame data = { 'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'], 'Age': [25, 30, 35, 40, 45], 'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'], 'Salary': [70000, 80000, 90000, 100000, 110000] } df = pd.DataFrame(data) # Displaying the original DataFrame print("Original DataFrame:\n", df) # Selecting specific columns print("\nSelecting 'Name' and 'Salary' columns:\n", df[['Name', 'Salary']]) # Selecting rows using loc (label-based) print("\nSelecting row where Name is 'Charlie':\n", df.loc[df['Name'] == 'Charlie']) # Selecting rows using iloc (position-based) print("\nSelecting the second row:\n", df.iloc[1]) # Filtering data based on conditions filtered_df = df[df['Age'] > 30] print("\nFiltering rows where Age > 30:\n", filtered_df) # Selecting a specific value using .at and .iat print("\nSalary of the first person (using .at):", df.at[0, 'Salary']) print("Age of the second person (using .iat):", df.iat[1, 1]) # Setting 'Name' as the index df.set_index('Name', inplace=True) print("\nDataFrame with 'Name' as index:\n", df) # Resetting index back to default df.reset_index(inplace=True) print("\nDataFrame after resetting index:\n", df) # Random sampling print("\nRandomly selecting 2 rows:\n", df.sample(2))
output:
Original DataFrame: Name Age City Salary 0 Alice 25 New York 70000 1 Bob 30 Los Angeles 80000 2 Charlie 35 Chicago 90000 3 David 40 Houston 100000 4 Eve 45 Phoenix 110000 Selecting 'Name' and 'Salary' columns: Name Salary 0 Alice 70000 1 Bob 80000 2 Charlie 90000 3 David 100000 4 Eve 110000 Selecting row where Name is 'Charlie': Name Age City Salary 2 Charlie 35 Chicago 90000 Selecting the second row: Name Bob Age 30 City Los Angeles Salary 80000 Name: 1, dtype: object Filtering rows where Age > 30: Name Age City Salary 2 Charlie 35 Chicago 90000 3 David 40 Houston 100000 4 Eve 45 Phoenix 110000 Salary of the first person (using .at): 70000 Age of the second person (using .iat): 30 DataFrame with 'Name' as index: Age City Salary Name Alice 25 New York 70000 Bob 30 Los Angeles 80000 Charlie 35 Chicago 90000 David 40 Houston 100000 Eve 45 Phoenix 110000 DataFrame after resetting index: Name Age City Salary 0 Alice 25 New York 70000 1 Bob 30 Los Angeles 80000 2 Charlie 35 Chicago 90000 3 David 40 Houston 100000 4 Eve 45 Phoenix 110000 Randomly selecting 2 rows: Name Age City Salary 3 David 40 Houston 100000 0 Alice 25 New York 70000 More Contribution you can refer are as mentioned : Python Pandas Data Science Library