Selecting Rows with loc

What is loc?

loc selects rows by label (index name).

Think of it as selecting by the row name, not position.

code.py

import pandas as pd

df = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike', 'Emma'],
    'Age': [25, 30, 28, 32],
    'City': ['NYC', 'LA', 'Chicago', 'Miami']
})

print(df)

Output:

    Name  Age     City
0   John   25      NYC
1  Sarah   30       LA
2   Mike   28  Chicago
3   Emma   32    Miami

Select Single Row

code.py

row = df.loc[0]
print(row)

Output:

Name     John
Age        25
City      NYC
Name: 0, dtype: object

Returns Series for single row.

Select Multiple Rows

code.py

rows = df.loc[[0, 2]]
print(rows)

Output:

   Name  Age     City
0  John   25      NYC
2  Mike   28  Chicago

Returns DataFrame for multiple rows.

Slicing Rows

code.py

subset = df.loc[0:2]
print(subset)

Important: Includes end point! Gets rows 0, 1, AND 2.

This is different from Python slicing!

Select Rows and Columns

code.py

result = df.loc[0, 'Name']
print(result)

Output:

John

Multiple rows, multiple columns:

code.py

subset = df.loc[[0, 1], ['Name', 'City']]
print(subset)

Output:

    Name City
0   John  NYC
1  Sarah   LA

All Rows, Specific Columns

code.py

names_ages = df.loc[:, ['Name', 'Age']]
print(names_ages)

: means all rows.

Specific Rows, All Columns

code.py

first_two = df.loc[0:1, :]
print(first_two)

Usually you can omit the second ::

code.py

first_two = df.loc[0:1]

Boolean Indexing

Most powerful feature of loc!

code.py

adults = df.loc[df['Age'] > 28]
print(adults)

What this does:

df['Age'] > 28 creates True/False for each row
loc uses these to select rows

Output:

    Name  Age   City
1  Sarah   30     LA
3   Emma   32  Miami

Multiple Conditions

AND condition (&):

code.py

result = df.loc[(df['Age'] > 25) & (df['City'] == 'LA')]
print(result)

OR condition (|):

code.py

result = df.loc[(df['Age'] < 27) | (df['City'] == 'Miami')]
print(result)

NOT condition (~):

code.py

not_nyc = df.loc[~(df['City'] == 'NYC')]
print(not_nyc)

Important: Use parentheses around each condition!

Custom Index

code.py

df_custom = pd.DataFrame({
    'Product': ['Laptop', 'Phone', 'Tablet'],
    'Price': [999, 599, 399]
}, index=['A', 'B', 'C'])

print(df_custom)

Output:

   Product  Price
A   Laptop    999
B    Phone    599
C   Tablet    399

Select by custom index:

code.py

row = df_custom.loc['B']
print(row)

Slice by labels:

code.py

subset = df_custom.loc['A':'C']
print(subset)

Practice Example

The scenario: Analyze employee records.

code.py

import pandas as pd

employees = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike', 'Emma', 'David', 'Lisa'],
    'Department': ['Sales', 'IT', 'Sales', 'HR', 'IT', 'Sales'],
    'Salary': [50000, 75000, 55000, 60000, 80000, 52000],
    'Years': [3, 7, 4, 5, 9, 2],
    'Remote': [False, True, False, True, True, False]
})

print("All employees:")
print(employees)
print()

print("Single employee (row 0):")
print(employees.loc[0])
print()

print("First three employees:")
print(employees.loc[0:2])
print()

print("Names and salaries only:")
print(employees.loc[:, ['Name', 'Salary']])
print()

print("High earners (salary > 60000):")
high_earners = employees.loc[employees['Salary'] > 60000]
print(high_earners)
print()

print("IT department:")
it_dept = employees.loc[employees['Department'] == 'IT']
print(it_dept)
print()

print("Remote workers with 5+ years:")
experienced_remote = employees.loc[
    (employees['Remote'] == True) & (employees['Years'] >= 5)
]
print(experienced_remote)
print()

print("Sales OR high salary:")
sales_or_high = employees.loc[
    (employees['Department'] == 'Sales') | (employees['Salary'] > 70000)
]
print(sales_or_high)

String Methods

code.py

df = pd.DataFrame({
    'Name': ['John Doe', 'Sarah Smith', 'Mike Jones'],
    'Email': ['john@email.com', 'sarah@email.com', 'mike@email.com']
})

gmail_users = df.loc[df['Email'].str.contains('email')]
print(gmail_users)

Other string methods:

code.py

starts_with_j = df.loc[df['Name'].str.startswith('J')]
ends_with_e = df.loc[df['Email'].str.endswith('.com')]

isin() Method

code.py

cities_to_find = ['NYC', 'LA']
result = df.loc[df['City'].isin(cities_to_find)]
print(result)

What this does: Select rows where City is NYC or LA.

between() Method

code.py

mid_age = df.loc[df['Age'].between(26, 30)]
print(mid_age)

Includes both endpoints by default.

Combining loc with Columns

code.py

result = df.loc[df['Age'] > 28, 'Name']
print(result)

Returns Series of names where age > 28.

Setting Values with loc

code.py

df.loc[0, 'Age'] = 26
print(df)

Multiple values:

code.py

df.loc[0:1, 'City'] = 'Boston'
print(df)

Conditional update:

code.py

df.loc[df['Age'] < 30, 'Category'] = 'Young'
df.loc[df['Age'] >= 30, 'Category'] = 'Senior'
print(df)

Key Points to Remember

loc selects by label (index name), not position.

loc[row, column] format. Both can be labels, lists, or conditions.

Slicing with loc includes the endpoint: loc[0:2] gets 0, 1, AND 2.

Boolean indexing is powerful: loc[df['Age'] > 30] filters rows.

Multiple conditions need parentheses and & (AND) or | (OR).

Common Mistakes

Mistake 1: Forgetting parentheses

code.py

df.loc[df['Age'] > 25 & df['City'] == 'NYC']  # Error!
df.loc[(df['Age'] > 25) & (df['City'] == 'NYC')]  # Correct

Mistake 2: Using 'and' instead of &

code.py

df.loc[(df['Age'] > 25) and (df['City'] == 'NYC')]  # Error!
df.loc[(df['Age'] > 25) & (df['City'] == 'NYC')]  # Correct

Mistake 3: Single = instead of ==

code.py

df.loc[df['City'] = 'NYC']  # Error!
df.loc[df['City'] == 'NYC']  # Correct

Mistake 4: Assuming Python slicing behavior

code.py

df.loc[0:2]  # Gets 0, 1, 2 (includes endpoint!)
# Different from list[0:2] which gets 0, 1

Mistake 5: Chained assignment

code.py

df[df['Age'] > 30]['Name'] = 'Senior'  # May not work!
df.loc[df['Age'] > 30, 'Name'] = 'Senior'  # Correct

What's Next?

You now know how to use loc for label-based selection. Next, you'll learn about iloc - selecting rows by position (numeric index).