Working with Indexes
Learn to set, reset, and manipulate DataFrame indexes
Working with Indexes
What is an Index?
The index is the label for each row. By default, it's 0, 1, 2, 3...
import pandas as pd
df = pd.DataFrame({
'Name': ['John', 'Sarah', 'Mike'],
'Age': [25, 30, 28]
})
print(df)Output:
Name Age
0 John 25
1 Sarah 30
2 Mike 28
0, 1, 2 are the index.
Set Index from Column
df_indexed = df.set_index('Name')
print(df_indexed)Output:
Age
Name
John 25
Sarah 30
Mike 28
Name becomes the index.
Set Index in Place
df.set_index('Name', inplace=True)
print(df)Modifies original DataFrame.
Reset Index
Go back to default 0, 1, 2...
df_reset = df.reset_index()
print(df_reset)Name becomes regular column again.
Reset Index and Drop Old
df_reset = df.reset_index(drop=True)
print(df_reset)Old index discarded, not saved as column.
Custom Index at Creation
df = pd.DataFrame({
'Product': ['Laptop', 'Phone', 'Tablet'],
'Price': [999, 599, 399]
}, index=['A', 'B', 'C'])
print(df)Output:
Product Price
A Laptop 999
B Phone 599
C Tablet 399
Set Custom Index
df.index = ['Item1', 'Item2', 'Item3']
print(df)MultiIndex (Multiple Levels)
df = pd.DataFrame({
'Region': ['North', 'North', 'South', 'South'],
'City': ['NYC', 'Boston', 'Miami', 'Atlanta'],
'Sales': [100, 150, 120, 130]
})
df_multi = df.set_index(['Region', 'City'])
print(df_multi)Output:
Sales
Region City
North NYC 100
Boston 150
South Miami 120
Atlanta 130
Access by Index Label
df = df.set_index('Name')
print(df.loc['John'])Gets row for John.
Sort by Index
df_sorted = df.sort_index()
print(df_sorted)Alphabetically by index.
Index Properties
print("Index:", df.index)
print("Index values:", df.index.tolist())
print("Index name:", df.index.name)
print("Has duplicates:", df.index.has_duplicates)Rename Index
df.index.name = 'Employee'
print(df)Practice Example
The scenario: Manage product catalog with custom indexes.
import pandas as pd
products = pd.DataFrame({
'Product_ID': ['P001', 'P002', 'P003', 'P004', 'P005'],
'Name': ['Laptop', 'Phone', 'Tablet', 'Monitor', 'Keyboard'],
'Category': ['Electronics', 'Electronics', 'Electronics', 'Electronics', 'Accessories'],
'Price': [999, 599, 399, 299, 75],
'Stock': [15, 30, 20, 12, 50]
})
print("Default index:")
print(products)
print()
print("1. Set Product_ID as index:")
products_indexed = products.set_index('Product_ID')
print(products_indexed)
print()
print("2. Access product P002:")
print(products_indexed.loc['P002'])
print()
print("3. Multi-level index (Category, Name):")
products_multi = products.set_index(['Category', 'Name'])
print(products_multi)
print()
print("4. Access Electronics:")
print(products_multi.loc['Electronics'])
print()
print("5. Reset to default index:")
products_reset = products_indexed.reset_index()
print(products_reset)
print()
print("6. Custom alphabetic index:")
products_alpha = products.copy()
products_alpha.index = ['A', 'B', 'C', 'D', 'E']
print(products_alpha)
print()
print("7. Sort by index:")
products_sorted = products_alpha.sort_index(ascending=False)
print(products_sorted)
print()
print("8. Name the index:")
products_named = products_indexed.copy()
products_named.index.name = 'SKU'
print(products_named.head())Date Index
Common for time series.
dates = pd.date_range('2024-01-01', periods=5)
df = pd.DataFrame({
'Sales': [100, 150, 120, 180, 160]
}, index=dates)
print(df)Output:
Sales
2024-01-01 100
2024-01-02 150
2024-01-03 120
2024-01-04 180
2024-01-05 160
Reindex with New Labels
df = pd.DataFrame({
'Value': [10, 20, 30]
}, index=['A', 'B', 'C'])
df_new = df.reindex(['A', 'B', 'C', 'D', 'E'])
print(df_new)New rows filled with NaN.
Fill Missing After Reindex
df_new = df.reindex(['A', 'B', 'C', 'D'], fill_value=0)
print(df_new)New rows get 0.
Check Index Type
print("Index type:", type(df.index))
print("Is RangeIndex:", isinstance(df.index, pd.RangeIndex))
print("Is DatetimeIndex:", isinstance(df.index, pd.DatetimeIndex))Duplicate Indexes
Allowed but not recommended.
df = pd.DataFrame({
'Value': [10, 20, 30]
}, index=['A', 'A', 'B'])
print(df.loc['A']) # Returns multiple rowsCheck for duplicates:
if df.index.has_duplicates:
print("Warning: Duplicate indexes found!")Drop Index Level
For MultiIndex.
df_multi = df.set_index(['Region', 'City'])
df_single = df_multi.droplevel('Region')
print(df_single)Removes Region level, keeps City.
Swap Index Levels
df_swapped = df_multi.swaplevel()
print(df_swapped)City becomes first level, Region second.
Set Index from Range
df.index = range(100, 103)
print(df)Index: 100, 101, 102
Preserve Index When Filtering
df_filtered = df[df['Age'] > 25]
print(df_filtered.index)Original index preserved.
Reset if needed:
df_filtered = df_filtered.reset_index(drop=True)Index as Column
df['Index_Copy'] = df.index
print(df)Key Points to Remember
Default index is RangeIndex: 0, 1, 2, 3...
set_index('column') makes column the index. Original unchanged unless inplace=True.
reset_index() converts index back to column and creates default index.
Use drop=True with reset_index() to discard old index.
Index used with loc: df.loc['index_label']
MultiIndex allows hierarchical row labels for complex data.
Common Mistakes
Mistake 1: Forgetting assignment
df.set_index('Name') # Doesn't change df!
df = df.set_index('Name') # Correct
# OR
df.set_index('Name', inplace=True)Mistake 2: Wrong index length
df.index = ['A', 'B'] # Error if df has 3 rows!
# Must match number of rowsMistake 3: Using iloc with custom index
df.set_index('Name', inplace=True)
df.iloc['John'] # Error! Use loc for labels
df.loc['John'] # Correct
df.iloc[0] # Also works (position)Mistake 4: Not handling duplicate indexes
df.set_index('Name', inplace=True)
row = df.loc['John'] # May return multiple rows!
# Check: df.index.has_duplicatesMistake 5: Losing index after operations
df_sorted = df.sort_values('Age')
# Index preserved but order changed
df_sorted = df.sort_values('Age').reset_index(drop=True)
# New sequential indexWhat's Next?
You now know how to work with DataFrame indexes. You've completed the Pandas DataFrames I module! Next modules will cover data aggregation, grouping, merging, and advanced Pandas operations.