Selecting Rows with loc
Learn to select rows using label-based indexing with loc
Selecting Rows with loc
What is loc?
loc selects rows by label (index name).
Think of it as selecting by the row name, not position.
import pandas as pd
df = pd.DataFrame({
'Name': ['John', 'Sarah', 'Mike', 'Emma'],
'Age': [25, 30, 28, 32],
'City': ['NYC', 'LA', 'Chicago', 'Miami']
})
print(df)Output:
Name Age City
0 John 25 NYC
1 Sarah 30 LA
2 Mike 28 Chicago
3 Emma 32 Miami
Select Single Row
row = df.loc[0]
print(row)Output:
Name John
Age 25
City NYC
Name: 0, dtype: object
Returns Series for single row.
Select Multiple Rows
rows = df.loc[[0, 2]]
print(rows)Output:
Name Age City
0 John 25 NYC
2 Mike 28 Chicago
Returns DataFrame for multiple rows.
Slicing Rows
subset = df.loc[0:2]
print(subset)Important: Includes end point! Gets rows 0, 1, AND 2.
This is different from Python slicing!
Select Rows and Columns
result = df.loc[0, 'Name']
print(result)Output:
John
Multiple rows, multiple columns:
subset = df.loc[[0, 1], ['Name', 'City']]
print(subset)Output:
Name City
0 John NYC
1 Sarah LA
All Rows, Specific Columns
names_ages = df.loc[:, ['Name', 'Age']]
print(names_ages): means all rows.
Specific Rows, All Columns
first_two = df.loc[0:1, :]
print(first_two)Usually you can omit the second ::
first_two = df.loc[0:1]Boolean Indexing
Most powerful feature of loc!
adults = df.loc[df['Age'] > 28]
print(adults)What this does:
- df['Age'] > 28 creates True/False for each row
- loc uses these to select rows
Output:
Name Age City
1 Sarah 30 LA
3 Emma 32 Miami
Multiple Conditions
AND condition (&):
result = df.loc[(df['Age'] > 25) & (df['City'] == 'LA')]
print(result)OR condition (|):
result = df.loc[(df['Age'] < 27) | (df['City'] == 'Miami')]
print(result)NOT condition (~):
not_nyc = df.loc[~(df['City'] == 'NYC')]
print(not_nyc)Important: Use parentheses around each condition!
Custom Index
df_custom = pd.DataFrame({
'Product': ['Laptop', 'Phone', 'Tablet'],
'Price': [999, 599, 399]
}, index=['A', 'B', 'C'])
print(df_custom)Output:
Product Price
A Laptop 999
B Phone 599
C Tablet 399
Select by custom index:
row = df_custom.loc['B']
print(row)Slice by labels:
subset = df_custom.loc['A':'C']
print(subset)Practice Example
The scenario: Analyze employee records.
import pandas as pd
employees = pd.DataFrame({
'Name': ['John', 'Sarah', 'Mike', 'Emma', 'David', 'Lisa'],
'Department': ['Sales', 'IT', 'Sales', 'HR', 'IT', 'Sales'],
'Salary': [50000, 75000, 55000, 60000, 80000, 52000],
'Years': [3, 7, 4, 5, 9, 2],
'Remote': [False, True, False, True, True, False]
})
print("All employees:")
print(employees)
print()
print("Single employee (row 0):")
print(employees.loc[0])
print()
print("First three employees:")
print(employees.loc[0:2])
print()
print("Names and salaries only:")
print(employees.loc[:, ['Name', 'Salary']])
print()
print("High earners (salary > 60000):")
high_earners = employees.loc[employees['Salary'] > 60000]
print(high_earners)
print()
print("IT department:")
it_dept = employees.loc[employees['Department'] == 'IT']
print(it_dept)
print()
print("Remote workers with 5+ years:")
experienced_remote = employees.loc[
(employees['Remote'] == True) & (employees['Years'] >= 5)
]
print(experienced_remote)
print()
print("Sales OR high salary:")
sales_or_high = employees.loc[
(employees['Department'] == 'Sales') | (employees['Salary'] > 70000)
]
print(sales_or_high)String Methods
df = pd.DataFrame({
'Name': ['John Doe', 'Sarah Smith', 'Mike Jones'],
'Email': ['john@email.com', 'sarah@email.com', 'mike@email.com']
})
gmail_users = df.loc[df['Email'].str.contains('email')]
print(gmail_users)Other string methods:
starts_with_j = df.loc[df['Name'].str.startswith('J')]
ends_with_e = df.loc[df['Email'].str.endswith('.com')]isin() Method
cities_to_find = ['NYC', 'LA']
result = df.loc[df['City'].isin(cities_to_find)]
print(result)What this does: Select rows where City is NYC or LA.
between() Method
mid_age = df.loc[df['Age'].between(26, 30)]
print(mid_age)Includes both endpoints by default.
Combining loc with Columns
result = df.loc[df['Age'] > 28, 'Name']
print(result)Returns Series of names where age > 28.
Setting Values with loc
df.loc[0, 'Age'] = 26
print(df)Multiple values:
df.loc[0:1, 'City'] = 'Boston'
print(df)Conditional update:
df.loc[df['Age'] < 30, 'Category'] = 'Young'
df.loc[df['Age'] >= 30, 'Category'] = 'Senior'
print(df)Key Points to Remember
loc selects by label (index name), not position.
loc[row, column] format. Both can be labels, lists, or conditions.
Slicing with loc includes the endpoint: loc[0:2] gets 0, 1, AND 2.
Boolean indexing is powerful: loc[df['Age'] > 30] filters rows.
Multiple conditions need parentheses and & (AND) or | (OR).
Common Mistakes
Mistake 1: Forgetting parentheses
df.loc[df['Age'] > 25 & df['City'] == 'NYC'] # Error!
df.loc[(df['Age'] > 25) & (df['City'] == 'NYC')] # CorrectMistake 2: Using 'and' instead of &
df.loc[(df['Age'] > 25) and (df['City'] == 'NYC')] # Error!
df.loc[(df['Age'] > 25) & (df['City'] == 'NYC')] # CorrectMistake 3: Single = instead of ==
df.loc[df['City'] = 'NYC'] # Error!
df.loc[df['City'] == 'NYC'] # CorrectMistake 4: Assuming Python slicing behavior
df.loc[0:2] # Gets 0, 1, 2 (includes endpoint!)
# Different from list[0:2] which gets 0, 1Mistake 5: Chained assignment
df[df['Age'] > 30]['Name'] = 'Senior' # May not work!
df.loc[df['Age'] > 30, 'Name'] = 'Senior' # CorrectWhat's Next?
You now know how to use loc for label-based selection. Next, you'll learn about iloc - selecting rows by position (numeric index).