Working with Indexes

What is an Index?

The index is the label for each row. By default, it's 0, 1, 2, 3...

code.py

import pandas as pd

df = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike'],
    'Age': [25, 30, 28]
})

print(df)

Output:

    Name  Age
0   John   25
1  Sarah   30
2   Mike   28

0, 1, 2 are the index.

Set Index from Column

code.py

df_indexed = df.set_index('Name')
print(df_indexed)

Output:

       Age
Name
John    25
Sarah   30
Mike    28

Name becomes the index.

Set Index in Place

code.py

df.set_index('Name', inplace=True)
print(df)

Modifies original DataFrame.

Reset Index

Go back to default 0, 1, 2...

code.py

df_reset = df.reset_index()
print(df_reset)

Name becomes regular column again.

Reset Index and Drop Old

code.py

df_reset = df.reset_index(drop=True)
print(df_reset)

Old index discarded, not saved as column.

Custom Index at Creation

code.py

df = pd.DataFrame({
    'Product': ['Laptop', 'Phone', 'Tablet'],
    'Price': [999, 599, 399]
}, index=['A', 'B', 'C'])

print(df)

Output:

   Product  Price
A   Laptop    999
B    Phone    599
C   Tablet    399

Set Custom Index

code.py

df.index = ['Item1', 'Item2', 'Item3']
print(df)

MultiIndex (Multiple Levels)

code.py

df = pd.DataFrame({
    'Region': ['North', 'North', 'South', 'South'],
    'City': ['NYC', 'Boston', 'Miami', 'Atlanta'],
    'Sales': [100, 150, 120, 130]
})

df_multi = df.set_index(['Region', 'City'])
print(df_multi)

Output:

              Sales
Region City
North  NYC     100
       Boston  150
South  Miami   120
       Atlanta 130

Access by Index Label

code.py

df = df.set_index('Name')
print(df.loc['John'])

Gets row for John.

Sort by Index

code.py

df_sorted = df.sort_index()
print(df_sorted)

Alphabetically by index.

Index Properties

code.py

print("Index:", df.index)
print("Index values:", df.index.tolist())
print("Index name:", df.index.name)
print("Has duplicates:", df.index.has_duplicates)

Rename Index

code.py

df.index.name = 'Employee'
print(df)

Practice Example

The scenario: Manage product catalog with custom indexes.

code.py

import pandas as pd

products = pd.DataFrame({
    'Product_ID': ['P001', 'P002', 'P003', 'P004', 'P005'],
    'Name': ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Keyboard'],
    'Category': ['Electronics', 'Electronics', 'Electronics', 'Electronics', 'Accessories'],
    'Price': [999, 599, 399, 299, 75],
    'Stock': [15, 30, 20, 12, 50]
})

print("Default index:")
print(products)
print()

print("1. Set Product_ID as index:")
products_indexed = products.set_index('Product_ID')
print(products_indexed)
print()

print("2. Access product P002:")
print(products_indexed.loc['P002'])
print()

print("3. Multi-level index (Category, Name):")
products_multi = products.set_index(['Category', 'Name'])
print(products_multi)
print()

print("4. Access Electronics:")
print(products_multi.loc['Electronics'])
print()

print("5. Reset to default index:")
products_reset = products_indexed.reset_index()
print(products_reset)
print()

print("6. Custom alphabetic index:")
products_alpha = products.copy()
products_alpha.index = ['A', 'B', 'C', 'D', 'E']
print(products_alpha)
print()

print("7. Sort by index:")
products_sorted = products_alpha.sort_index(ascending=False)
print(products_sorted)
print()

print("8. Name the index:")
products_named = products_indexed.copy()
products_named.index.name = 'SKU'
print(products_named.head())

Date Index

Common for time series.

code.py

dates = pd.date_range('2024-01-01', periods=5)
df = pd.DataFrame({
    'Sales': [100, 150, 120, 180, 160]
}, index=dates)

print(df)

Output:

            Sales
2024-01-01    100
2024-01-02    150
2024-01-03    120
2024-01-04    180
2024-01-05    160

Reindex with New Labels

code.py

df = pd.DataFrame({
    'Value': [10, 20, 30]
}, index=['A', 'B', 'C'])

df_new = df.reindex(['A', 'B', 'C', 'D', 'E'])
print(df_new)

New rows filled with NaN.

Fill Missing After Reindex

code.py

df_new = df.reindex(['A', 'B', 'C', 'D'], fill_value=0)
print(df_new)

New rows get 0.

Check Index Type

code.py

print("Index type:", type(df.index))
print("Is RangeIndex:", isinstance(df.index, pd.RangeIndex))
print("Is DatetimeIndex:", isinstance(df.index, pd.DatetimeIndex))

Duplicate Indexes

Allowed but not recommended.

code.py

df = pd.DataFrame({
    'Value': [10, 20, 30]
}, index=['A', 'A', 'B'])

print(df.loc['A'])  # Returns multiple rows

Check for duplicates:

code.py

if df.index.has_duplicates:
    print("Warning: Duplicate indexes found!")

Drop Index Level

For MultiIndex.

code.py

df_multi = df.set_index(['Region', 'City'])
df_single = df_multi.droplevel('Region')
print(df_single)

Removes Region level, keeps City.

Swap Index Levels

code.py

df_swapped = df_multi.swaplevel()
print(df_swapped)

City becomes first level, Region second.

Set Index from Range

code.py

df.index = range(100, 103)
print(df)

Index: 100, 101, 102

Preserve Index When Filtering

code.py

df_filtered = df[df['Age'] > 25]
print(df_filtered.index)

Original index preserved.

Reset if needed:

code.py

df_filtered = df_filtered.reset_index(drop=True)

Index as Column

code.py

df['Index_Copy'] = df.index
print(df)

Key Points to Remember

Default index is RangeIndex: 0, 1, 2, 3...

set_index('column') makes column the index. Original unchanged unless inplace=True.

reset_index() converts index back to column and creates default index.

Use drop=True with reset_index() to discard old index.

Index used with loc: df.loc['index_label']

MultiIndex allows hierarchical row labels for complex data.

Common Mistakes

Mistake 1: Forgetting assignment

code.py

df.set_index('Name')  # Doesn't change df!
df = df.set_index('Name')  # Correct
# OR
df.set_index('Name', inplace=True)

Mistake 2: Wrong index length

code.py

df.index = ['A', 'B']  # Error if df has 3 rows!
# Must match number of rows

Mistake 3: Using iloc with custom index

code.py

df.set_index('Name', inplace=True)
df.iloc['John']  # Error! Use loc for labels
df.loc['John']  # Correct
df.iloc[0]  # Also works (position)

Mistake 4: Not handling duplicate indexes

code.py

df.set_index('Name', inplace=True)
row = df.loc['John']  # May return multiple rows!
# Check: df.index.has_duplicates

Mistake 5: Losing index after operations

code.py

df_sorted = df.sort_values('Age')
# Index preserved but order changed
df_sorted = df.sort_values('Age').reset_index(drop=True)
# New sequential index

What's Next?

You now know how to work with DataFrame indexes. You've completed the Pandas DataFrames I module! Next modules will cover data aggregation, grouping, merging, and advanced Pandas operations.