5 min read min read
Concatenating DataFrames
Learn to stack DataFrames on top of each other or side by side
Concatenating DataFrames
What is Concatenation?
Concatenation means joining things together. Think of it like stacking papers.
- Stack papers on top of each other = add more rows
- Put papers side by side = add more columns
Stack DataFrames Vertically (Add Rows)
code.py
import pandas as pd
# First DataFrame - January sales
jan_sales = pd.DataFrame({
'Product': ['Apple', 'Banana'],
'Sales': [100, 150]
})
# Second DataFrame - February sales
feb_sales = pd.DataFrame({
'Product': ['Apple', 'Banana'],
'Sales': [120, 180]
})
# Stack them on top of each other
all_sales = pd.concat([jan_sales, feb_sales])
print(all_sales)Output:
Product Sales
0 Apple 100
1 Banana 150
0 Apple 120
1 Banana 180
Notice: The index repeats (0, 1, 0, 1). We can fix this.
Fix the Index
code.py
all_sales = pd.concat([jan_sales, feb_sales], ignore_index=True)
print(all_sales)Output:
Product Sales
0 Apple 100
1 Banana 150
2 Apple 120
3 Banana 180
Now index is 0, 1, 2, 3.
Stack DataFrames Side by Side (Add Columns)
code.py
# Product names
names = pd.DataFrame({
'Product': ['Apple', 'Banana', 'Cherry']
})
# Product prices
prices = pd.DataFrame({
'Price': [1.00, 0.50, 2.00]
})
# Put them side by side
products = pd.concat([names, prices], axis=1)
print(products)Output:
Product Price
0 Apple 1.00
1 Banana 0.50
2 Cherry 2.00
axis=1 means "add columns".
Concatenate Multiple DataFrames
code.py
jan = pd.DataFrame({'Month': ['Jan'], 'Sales': [100]})
feb = pd.DataFrame({'Month': ['Feb'], 'Sales': [120]})
mar = pd.DataFrame({'Month': ['Mar'], 'Sales': [150]})
# Join all three
all_months = pd.concat([jan, feb, mar], ignore_index=True)
print(all_months)Output:
Month Sales
0 Jan 100
1 Feb 120
2 Mar 150
When Columns Don't Match
code.py
df1 = pd.DataFrame({
'Name': ['John', 'Sarah'],
'Age': [25, 30]
})
df2 = pd.DataFrame({
'Name': ['Mike'],
'City': ['NYC']
})
result = pd.concat([df1, df2], ignore_index=True)
print(result)Output:
Name Age City
0 John 25.0 NaN
1 Sarah 30.0 NaN
2 Mike NaN NYC
Missing values become NaN (Not a Number).
Add Labels to Know Where Data Came From
code.py
jan = pd.DataFrame({'Sales': [100, 150]})
feb = pd.DataFrame({'Sales': [120, 180]})
result = pd.concat([jan, feb], keys=['January', 'February'])
print(result)Output:
Sales
January 0 100
1 150
February 0 120
1 180
Now you know which data came from which month.
Practice Example
code.py
import pandas as pd
# Weekly sales reports
week1 = pd.DataFrame({
'Day': ['Mon', 'Tue', 'Wed'],
'Sales': [200, 250, 180]
})
week2 = pd.DataFrame({
'Day': ['Mon', 'Tue', 'Wed'],
'Sales': [220, 270, 200]
})
# Combine all weeks
all_weeks = pd.concat([week1, week2], ignore_index=True)
print("All sales:")
print(all_weeks)
# Total sales
total = all_weeks['Sales'].sum()
print(f"\nTotal sales: {total}")Key Points
pd.concat([df1, df2])stacks DataFrames vertically (adds rows)pd.concat([df1, df2], axis=1)puts them side by side (adds columns)ignore_index=Truegives fresh index numbers 0, 1, 2, 3...- Missing columns become NaN
- Use
keysto label where data came from
Common Mistakes
Mistake 1: Forgetting the list brackets
code.py
# Wrong!
pd.concat(df1, df2)
# Correct
pd.concat([df1, df2])Mistake 2: Wrong axis for side by side
code.py
# This adds rows (default)
pd.concat([df1, df2])
# This adds columns
pd.concat([df1, df2], axis=1)What's Next?
You learned to stack DataFrames. Next, you'll learn merging - joining DataFrames based on matching values (like matching customer IDs).