#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
6 min read min read

Working with Indexes

Learn to set, reset, and manipulate DataFrame indexes

Working with Indexes

What is an Index?

The index is the label for each row. By default, it's 0, 1, 2, 3...

code.py
import pandas as pd

df = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike'],
    'Age': [25, 30, 28]
})

print(df)

Output:

Name Age 0 John 25 1 Sarah 30 2 Mike 28

0, 1, 2 are the index.

Set Index from Column

code.py
df_indexed = df.set_index('Name')
print(df_indexed)

Output:

Age Name John 25 Sarah 30 Mike 28

Name becomes the index.

Set Index in Place

code.py
df.set_index('Name', inplace=True)
print(df)

Modifies original DataFrame.

Reset Index

Go back to default 0, 1, 2...

code.py
df_reset = df.reset_index()
print(df_reset)

Name becomes regular column again.

Reset Index and Drop Old

code.py
df_reset = df.reset_index(drop=True)
print(df_reset)

Old index discarded, not saved as column.

Custom Index at Creation

code.py
df = pd.DataFrame({
    'Product': ['Laptop', 'Phone', 'Tablet'],
    'Price': [999, 599, 399]
}, index=['A', 'B', 'C'])

print(df)

Output:

Product Price A Laptop 999 B Phone 599 C Tablet 399

Set Custom Index

code.py
df.index = ['Item1', 'Item2', 'Item3']
print(df)

MultiIndex (Multiple Levels)

code.py
df = pd.DataFrame({
    'Region': ['North', 'North', 'South', 'South'],
    'City': ['NYC', 'Boston', 'Miami', 'Atlanta'],
    'Sales': [100, 150, 120, 130]
})

df_multi = df.set_index(['Region', 'City'])
print(df_multi)

Output:

Sales Region City North NYC 100 Boston 150 South Miami 120 Atlanta 130

Access by Index Label

code.py
df = df.set_index('Name')
print(df.loc['John'])

Gets row for John.

Sort by Index

code.py
df_sorted = df.sort_index()
print(df_sorted)

Alphabetically by index.

Index Properties

code.py
print("Index:", df.index)
print("Index values:", df.index.tolist())
print("Index name:", df.index.name)
print("Has duplicates:", df.index.has_duplicates)

Rename Index

code.py
df.index.name = 'Employee'
print(df)

Practice Example

The scenario: Manage product catalog with custom indexes.

code.py
import pandas as pd

products = pd.DataFrame({
    'Product_ID': ['P001', 'P002', 'P003', 'P004', 'P005'],
    'Name': ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Keyboard'],
    'Category': ['Electronics', 'Electronics', 'Electronics', 'Electronics', 'Accessories'],
    'Price': [999, 599, 399, 299, 75],
    'Stock': [15, 30, 20, 12, 50]
})

print("Default index:")
print(products)
print()

print("1. Set Product_ID as index:")
products_indexed = products.set_index('Product_ID')
print(products_indexed)
print()

print("2. Access product P002:")
print(products_indexed.loc['P002'])
print()

print("3. Multi-level index (Category, Name):")
products_multi = products.set_index(['Category', 'Name'])
print(products_multi)
print()

print("4. Access Electronics:")
print(products_multi.loc['Electronics'])
print()

print("5. Reset to default index:")
products_reset = products_indexed.reset_index()
print(products_reset)
print()

print("6. Custom alphabetic index:")
products_alpha = products.copy()
products_alpha.index = ['A', 'B', 'C', 'D', 'E']
print(products_alpha)
print()

print("7. Sort by index:")
products_sorted = products_alpha.sort_index(ascending=False)
print(products_sorted)
print()

print("8. Name the index:")
products_named = products_indexed.copy()
products_named.index.name = 'SKU'
print(products_named.head())

Date Index

Common for time series.

code.py
dates = pd.date_range('2024-01-01', periods=5)
df = pd.DataFrame({
    'Sales': [100, 150, 120, 180, 160]
}, index=dates)

print(df)

Output:

Sales 2024-01-01 100 2024-01-02 150 2024-01-03 120 2024-01-04 180 2024-01-05 160

Reindex with New Labels

code.py
df = pd.DataFrame({
    'Value': [10, 20, 30]
}, index=['A', 'B', 'C'])

df_new = df.reindex(['A', 'B', 'C', 'D', 'E'])
print(df_new)

New rows filled with NaN.

Fill Missing After Reindex

code.py
df_new = df.reindex(['A', 'B', 'C', 'D'], fill_value=0)
print(df_new)

New rows get 0.

Check Index Type

code.py
print("Index type:", type(df.index))
print("Is RangeIndex:", isinstance(df.index, pd.RangeIndex))
print("Is DatetimeIndex:", isinstance(df.index, pd.DatetimeIndex))

Duplicate Indexes

Allowed but not recommended.

code.py
df = pd.DataFrame({
    'Value': [10, 20, 30]
}, index=['A', 'A', 'B'])

print(df.loc['A'])  # Returns multiple rows

Check for duplicates:

code.py
if df.index.has_duplicates:
    print("Warning: Duplicate indexes found!")

Drop Index Level

For MultiIndex.

code.py
df_multi = df.set_index(['Region', 'City'])
df_single = df_multi.droplevel('Region')
print(df_single)

Removes Region level, keeps City.

Swap Index Levels

code.py
df_swapped = df_multi.swaplevel()
print(df_swapped)

City becomes first level, Region second.

Set Index from Range

code.py
df.index = range(100, 103)
print(df)

Index: 100, 101, 102

Preserve Index When Filtering

code.py
df_filtered = df[df['Age'] > 25]
print(df_filtered.index)

Original index preserved.

Reset if needed:

code.py
df_filtered = df_filtered.reset_index(drop=True)

Index as Column

code.py
df['Index_Copy'] = df.index
print(df)

Key Points to Remember

Default index is RangeIndex: 0, 1, 2, 3...

set_index('column') makes column the index. Original unchanged unless inplace=True.

reset_index() converts index back to column and creates default index.

Use drop=True with reset_index() to discard old index.

Index used with loc: df.loc['index_label']

MultiIndex allows hierarchical row labels for complex data.

Common Mistakes

Mistake 1: Forgetting assignment

code.py
df.set_index('Name')  # Doesn't change df!
df = df.set_index('Name')  # Correct
# OR
df.set_index('Name', inplace=True)

Mistake 2: Wrong index length

code.py
df.index = ['A', 'B']  # Error if df has 3 rows!
# Must match number of rows

Mistake 3: Using iloc with custom index

code.py
df.set_index('Name', inplace=True)
df.iloc['John']  # Error! Use loc for labels
df.loc['John']  # Correct
df.iloc[0]  # Also works (position)

Mistake 4: Not handling duplicate indexes

code.py
df.set_index('Name', inplace=True)
row = df.loc['John']  # May return multiple rows!
# Check: df.index.has_duplicates

Mistake 5: Losing index after operations

code.py
df_sorted = df.sort_values('Age')
# Index preserved but order changed
df_sorted = df.sort_values('Age').reset_index(drop=True)
# New sequential index

What's Next?

You now know how to work with DataFrame indexes. You've completed the Pandas DataFrames I module! Next modules will cover data aggregation, grouping, merging, and advanced Pandas operations.

SkillsetMaster - AI, Web Development & Data Analytics Courses