#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
6 min read min read

Working with Indexes

Learn to set, reset, and manipulate DataFrame indexes

Working with Indexes

What is an Index?

The index is the label for each row. By default, it's 0, 1, 2, 3...

code.pyPython
import pandas as pd

df = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike'],
    'Age': [25, 30, 28]
})

print(df)

Output:

Name Age 0 John 25 1 Sarah 30 2 Mike 28

0, 1, 2 are the index.

Set Index from Column

code.pyPython
df_indexed = df.set_index('Name')
print(df_indexed)

Output:

Age Name John 25 Sarah 30 Mike 28

Name becomes the index.

Set Index in Place

code.pyPython
df.set_index('Name', inplace=True)
print(df)

Modifies original DataFrame.

Reset Index

Go back to default 0, 1, 2...

code.pyPython
df_reset = df.reset_index()
print(df_reset)

Name becomes regular column again.

Reset Index and Drop Old

code.pyPython
df_reset = df.reset_index(drop=True)
print(df_reset)

Old index discarded, not saved as column.

Custom Index at Creation

code.pyPython
df = pd.DataFrame({
    'Product': ['Laptop', 'Phone', 'Tablet'],
    'Price': [999, 599, 399]
}, index=['A', 'B', 'C'])

print(df)

Output:

Product Price A Laptop 999 B Phone 599 C Tablet 399

Set Custom Index

code.pyPython
df.index = ['Item1', 'Item2', 'Item3']
print(df)

MultiIndex (Multiple Levels)

code.pyPython
df = pd.DataFrame({
    'Region': ['North', 'North', 'South', 'South'],
    'City': ['NYC', 'Boston', 'Miami', 'Atlanta'],
    'Sales': [100, 150, 120, 130]
})

df_multi = df.set_index(['Region', 'City'])
print(df_multi)

Output:

Sales Region City North NYC 100 Boston 150 South Miami 120 Atlanta 130

Access by Index Label

code.pyPython
df = df.set_index('Name')
print(df.loc['John'])

Gets row for John.

Sort by Index

code.pyPython
df_sorted = df.sort_index()
print(df_sorted)

Alphabetically by index.

Index Properties

code.pyPython
print("Index:", df.index)
print("Index values:", df.index.tolist())
print("Index name:", df.index.name)
print("Has duplicates:", df.index.has_duplicates)

Rename Index

code.pyPython
df.index.name = 'Employee'
print(df)

Practice Example

The scenario: Manage product catalog with custom indexes.

code.pyPython
import pandas as pd

products = pd.DataFrame({
    'Product_ID': ['P001', 'P002', 'P003', 'P004', 'P005'],
    'Name': ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Keyboard'],
    'Category': ['Electronics', 'Electronics', 'Electronics', 'Electronics', 'Accessories'],
    'Price': [999, 599, 399, 299, 75],
    'Stock': [15, 30, 20, 12, 50]
})

print("Default index:")
print(products)
print()

print("1. Set Product_ID as index:")
products_indexed = products.set_index('Product_ID')
print(products_indexed)
print()

print("2. Access product P002:")
print(products_indexed.loc['P002'])
print()

print("3. Multi-level index (Category, Name):")
products_multi = products.set_index(['Category', 'Name'])
print(products_multi)
print()

print("4. Access Electronics:")
print(products_multi.loc['Electronics'])
print()

print("5. Reset to default index:")
products_reset = products_indexed.reset_index()
print(products_reset)
print()

print("6. Custom alphabetic index:")
products_alpha = products.copy()
products_alpha.index = ['A', 'B', 'C', 'D', 'E']
print(products_alpha)
print()

print("7. Sort by index:")
products_sorted = products_alpha.sort_index(ascending=False)
print(products_sorted)
print()

print("8. Name the index:")
products_named = products_indexed.copy()
products_named.index.name = 'SKU'
print(products_named.head())

Date Index

Common for time series.

code.pyPython
dates = pd.date_range('2024-01-01', periods=5)
df = pd.DataFrame({
    'Sales': [100, 150, 120, 180, 160]
}, index=dates)

print(df)

Output:

Sales 2024-01-01 100 2024-01-02 150 2024-01-03 120 2024-01-04 180 2024-01-05 160

Reindex with New Labels

code.pyPython
df = pd.DataFrame({
    'Value': [10, 20, 30]
}, index=['A', 'B', 'C'])

df_new = df.reindex(['A', 'B', 'C', 'D', 'E'])
print(df_new)

New rows filled with NaN.

Fill Missing After Reindex

code.pyPython
df_new = df.reindex(['A', 'B', 'C', 'D'], fill_value=0)
print(df_new)

New rows get 0.

Check Index Type

code.pyPython
print("Index type:", type(df.index))
print("Is RangeIndex:", isinstance(df.index, pd.RangeIndex))
print("Is DatetimeIndex:", isinstance(df.index, pd.DatetimeIndex))

Duplicate Indexes

Allowed but not recommended.

code.pyPython
df = pd.DataFrame({
    'Value': [10, 20, 30]
}, index=['A', 'A', 'B'])

print(df.loc['A'])  # Returns multiple rows

Check for duplicates:

code.pyPython
if df.index.has_duplicates:
    print("Warning: Duplicate indexes found!")

Drop Index Level

For MultiIndex.

code.pyPython
df_multi = df.set_index(['Region', 'City'])
df_single = df_multi.droplevel('Region')
print(df_single)

Removes Region level, keeps City.

Swap Index Levels

code.pyPython
df_swapped = df_multi.swaplevel()
print(df_swapped)

City becomes first level, Region second.

Set Index from Range

code.pyPython
df.index = range(100, 103)
print(df)

Index: 100, 101, 102

Preserve Index When Filtering

code.pyPython
df_filtered = df[df['Age'] > 25]
print(df_filtered.index)

Original index preserved.

Reset if needed:

code.pyPython
df_filtered = df_filtered.reset_index(drop=True)

Index as Column

code.pyPython
df['Index_Copy'] = df.index
print(df)

Key Points to Remember

Default index is RangeIndex: 0, 1, 2, 3...

set_index('column') makes column the index. Original unchanged unless inplace=True.

reset_index() converts index back to column and creates default index.

Use drop=True with reset_index() to discard old index.

Index used with loc: df.loc['index_label']

MultiIndex allows hierarchical row labels for complex data.

Common Mistakes

Mistake 1: Forgetting assignment

code.pyPython
df.set_index('Name')  # Doesn't change df!
df = df.set_index('Name')  # Correct
# OR
df.set_index('Name', inplace=True)

Mistake 2: Wrong index length

code.pyPython
df.index = ['A', 'B']  # Error if df has 3 rows!
# Must match number of rows

Mistake 3: Using iloc with custom index

code.pyPython
df.set_index('Name', inplace=True)
df.iloc['John']  # Error! Use loc for labels
df.loc['John']  # Correct
df.iloc[0]  # Also works (position)

Mistake 4: Not handling duplicate indexes

code.pyPython
df.set_index('Name', inplace=True)
row = df.loc['John']  # May return multiple rows!
# Check: df.index.has_duplicates

Mistake 5: Losing index after operations

code.pyPython
df_sorted = df.sort_values('Age')
# Index preserved but order changed
df_sorted = df.sort_values('Age').reset_index(drop=True)
# New sequential index

What's Next?

You now know how to work with DataFrame indexes. You've completed the Pandas DataFrames I module! Next modules will cover data aggregation, grouping, merging, and advanced Pandas operations.