Concatenating DataFrames

What is Concatenation?

Concatenation means joining things together. Think of it like stacking papers.

Stack papers on top of each other = add more rows
Put papers side by side = add more columns

Stack DataFrames Vertically (Add Rows)

code.py

import pandas as pd

# First DataFrame - January sales
jan_sales = pd.DataFrame({
    'Product': ['Apple', 'Banana'],
    'Sales': [100, 150]
})

# Second DataFrame - February sales
feb_sales = pd.DataFrame({
    'Product': ['Apple', 'Banana'],
    'Sales': [120, 180]
})

# Stack them on top of each other
all_sales = pd.concat([jan_sales, feb_sales])
print(all_sales)

Output:

  Product  Sales
0   Apple    100
1  Banana    150
0   Apple    120
1  Banana    180

Notice: The index repeats (0, 1, 0, 1). We can fix this.

Fix the Index

code.py

all_sales = pd.concat([jan_sales, feb_sales], ignore_index=True)
print(all_sales)

Output:

  Product  Sales
0   Apple    100
1  Banana    150
2   Apple    120
3  Banana    180

Now index is 0, 1, 2, 3.

Stack DataFrames Side by Side (Add Columns)

code.py

# Product names
names = pd.DataFrame({
    'Product': ['Apple', 'Banana', 'Cherry']
})

# Product prices
prices = pd.DataFrame({
    'Price': [1.00, 0.50, 2.00]
})

# Put them side by side
products = pd.concat([names, prices], axis=1)
print(products)

Output:

  Product  Price
0   Apple   1.00
1  Banana   0.50
2  Cherry   2.00

axis=1 means "add columns".

Concatenate Multiple DataFrames

code.py

jan = pd.DataFrame({'Month': ['Jan'], 'Sales': [100]})
feb = pd.DataFrame({'Month': ['Feb'], 'Sales': [120]})
mar = pd.DataFrame({'Month': ['Mar'], 'Sales': [150]})

# Join all three
all_months = pd.concat([jan, feb, mar], ignore_index=True)
print(all_months)

Output:

  Month  Sales
0   Jan    100
1   Feb    120
2   Mar    150

When Columns Don't Match

code.py

df1 = pd.DataFrame({
    'Name': ['John', 'Sarah'],
    'Age': [25, 30]
})

df2 = pd.DataFrame({
    'Name': ['Mike'],
    'City': ['NYC']
})

result = pd.concat([df1, df2], ignore_index=True)
print(result)

Output:

    Name   Age City
0   John  25.0  NaN
1  Sarah  30.0  NaN
2   Mike   NaN  NYC

Missing values become NaN (Not a Number).

Add Labels to Know Where Data Came From

code.py

jan = pd.DataFrame({'Sales': [100, 150]})
feb = pd.DataFrame({'Sales': [120, 180]})

result = pd.concat([jan, feb], keys=['January', 'February'])
print(result)

Output:

            Sales
January  0    100
         1    150
February 0    120
         1    180

Now you know which data came from which month.

Practice Example

code.py

import pandas as pd

# Weekly sales reports
week1 = pd.DataFrame({
    'Day': ['Mon', 'Tue', 'Wed'],
    'Sales': [200, 250, 180]
})

week2 = pd.DataFrame({
    'Day': ['Mon', 'Tue', 'Wed'],
    'Sales': [220, 270, 200]
})

# Combine all weeks
all_weeks = pd.concat([week1, week2], ignore_index=True)
print("All sales:")
print(all_weeks)

# Total sales
total = all_weeks['Sales'].sum()
print(f"\nTotal sales: {total}")

Key Points

pd.concat([df1, df2]) stacks DataFrames vertically (adds rows)
pd.concat([df1, df2], axis=1) puts them side by side (adds columns)
ignore_index=True gives fresh index numbers 0, 1, 2, 3...
Missing columns become NaN
Use keys to label where data came from

Common Mistakes

Mistake 1: Forgetting the list brackets

code.py

# Wrong!
pd.concat(df1, df2)

# Correct
pd.concat([df1, df2])

Mistake 2: Wrong axis for side by side

code.py

# This adds rows (default)
pd.concat([df1, df2])

# This adds columns
pd.concat([df1, df2], axis=1)

What's Next?

You learned to stack DataFrames. Next, you'll learn merging - joining DataFrames based on matching values (like matching customer IDs).