#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
5 min read min read

Concatenating DataFrames

Learn to stack DataFrames on top of each other or side by side

Concatenating DataFrames

What is Concatenation?

Concatenation means joining things together. Think of it like stacking papers.

  • Stack papers on top of each other = add more rows
  • Put papers side by side = add more columns

Stack DataFrames Vertically (Add Rows)

code.py
import pandas as pd

# First DataFrame - January sales
jan_sales = pd.DataFrame({
    'Product': ['Apple', 'Banana'],
    'Sales': [100, 150]
})

# Second DataFrame - February sales
feb_sales = pd.DataFrame({
    'Product': ['Apple', 'Banana'],
    'Sales': [120, 180]
})

# Stack them on top of each other
all_sales = pd.concat([jan_sales, feb_sales])
print(all_sales)

Output:

Product Sales 0 Apple 100 1 Banana 150 0 Apple 120 1 Banana 180

Notice: The index repeats (0, 1, 0, 1). We can fix this.

Fix the Index

code.py
all_sales = pd.concat([jan_sales, feb_sales], ignore_index=True)
print(all_sales)

Output:

Product Sales 0 Apple 100 1 Banana 150 2 Apple 120 3 Banana 180

Now index is 0, 1, 2, 3.

Stack DataFrames Side by Side (Add Columns)

code.py
# Product names
names = pd.DataFrame({
    'Product': ['Apple', 'Banana', 'Cherry']
})

# Product prices
prices = pd.DataFrame({
    'Price': [1.00, 0.50, 2.00]
})

# Put them side by side
products = pd.concat([names, prices], axis=1)
print(products)

Output:

Product Price 0 Apple 1.00 1 Banana 0.50 2 Cherry 2.00

axis=1 means "add columns".

Concatenate Multiple DataFrames

code.py
jan = pd.DataFrame({'Month': ['Jan'], 'Sales': [100]})
feb = pd.DataFrame({'Month': ['Feb'], 'Sales': [120]})
mar = pd.DataFrame({'Month': ['Mar'], 'Sales': [150]})

# Join all three
all_months = pd.concat([jan, feb, mar], ignore_index=True)
print(all_months)

Output:

Month Sales 0 Jan 100 1 Feb 120 2 Mar 150

When Columns Don't Match

code.py
df1 = pd.DataFrame({
    'Name': ['John', 'Sarah'],
    'Age': [25, 30]
})

df2 = pd.DataFrame({
    'Name': ['Mike'],
    'City': ['NYC']
})

result = pd.concat([df1, df2], ignore_index=True)
print(result)

Output:

Name Age City 0 John 25.0 NaN 1 Sarah 30.0 NaN 2 Mike NaN NYC

Missing values become NaN (Not a Number).

Add Labels to Know Where Data Came From

code.py
jan = pd.DataFrame({'Sales': [100, 150]})
feb = pd.DataFrame({'Sales': [120, 180]})

result = pd.concat([jan, feb], keys=['January', 'February'])
print(result)

Output:

Sales January 0 100 1 150 February 0 120 1 180

Now you know which data came from which month.

Practice Example

code.py
import pandas as pd

# Weekly sales reports
week1 = pd.DataFrame({
    'Day': ['Mon', 'Tue', 'Wed'],
    'Sales': [200, 250, 180]
})

week2 = pd.DataFrame({
    'Day': ['Mon', 'Tue', 'Wed'],
    'Sales': [220, 270, 200]
})

# Combine all weeks
all_weeks = pd.concat([week1, week2], ignore_index=True)
print("All sales:")
print(all_weeks)

# Total sales
total = all_weeks['Sales'].sum()
print(f"\nTotal sales: {total}")

Key Points

  • pd.concat([df1, df2]) stacks DataFrames vertically (adds rows)
  • pd.concat([df1, df2], axis=1) puts them side by side (adds columns)
  • ignore_index=True gives fresh index numbers 0, 1, 2, 3...
  • Missing columns become NaN
  • Use keys to label where data came from

Common Mistakes

Mistake 1: Forgetting the list brackets

code.py
# Wrong!
pd.concat(df1, df2)

# Correct
pd.concat([df1, df2])

Mistake 2: Wrong axis for side by side

code.py
# This adds rows (default)
pd.concat([df1, df2])

# This adds columns
pd.concat([df1, df2], axis=1)

What's Next?

You learned to stack DataFrames. Next, you'll learn merging - joining DataFrames based on matching values (like matching customer IDs).

SkillsetMaster - AI, Web Development & Data Analytics Courses