Merge vs Join vs Concat

Quick Summary

Method	Use When	How It Works
concat	Stacking data	Adds rows or columns
merge	Matching by column values	Like Excel VLOOKUP
join	Matching by index	Fast index matching

Concat - Stacking Data

Use concat when: You want to stack similar data together.

code.py

import pandas as pd

# Monthly sales - same columns
jan = pd.DataFrame({'Product': ['A', 'B'], 'Sales': [100, 150]})
feb = pd.DataFrame({'Product': ['A', 'B'], 'Sales': [120, 180]})

# Stack them (add rows)
all_sales = pd.concat([jan, feb], ignore_index=True)
print(all_sales)

Output:

  Product  Sales
0       A    100
1       B    150
2       A    120
3       B    180

Think: "I want to combine similar tables"

Merge - Match by Column Values

Use merge when: You need to connect data using a shared column.

code.py

# Different information, same IDs
customers = pd.DataFrame({
    'customer_id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie']
})

orders = pd.DataFrame({
    'customer_id': [1, 1, 2],
    'amount': [50, 75, 100]
})

# Connect using customer_id
result = pd.merge(customers, orders, on='customer_id')
print(result)

Output:

   customer_id     name  amount
0            1    Alice      50
1            1    Alice      75
2            2      Bob     100

Think: "I want to look up matching values"

Join - Match by Index

Use join when: Your data has meaningful index values.

code.py

# Data with index as identifier
names = pd.DataFrame({'name': ['John', 'Sarah']}, index=['E001', 'E002'])
salaries = pd.DataFrame({'salary': [50000, 60000]}, index=['E001', 'E002'])

result = names.join(salaries)
print(result)

Output:

       name  salary
E001   John   50000
E002  Sarah   60000

Think: "My row labels are important identifiers"

Real World Examples

Example 1: Combining Survey Responses (Use Concat)

code.py

# Survey responses from different days
day1 = pd.DataFrame({
    'response': ['Good', 'Bad', 'Good'],
    'score': [5, 2, 4]
})

day2 = pd.DataFrame({
    'response': ['Good', 'Okay'],
    'score': [5, 3]
})

all_responses = pd.concat([day1, day2], ignore_index=True)
print(all_responses)

Example 2: Adding Product Details to Orders (Use Merge)

code.py

orders = pd.DataFrame({
    'order_id': [1, 2, 3],
    'product_code': ['P1', 'P2', 'P1'],
    'quantity': [2, 1, 3]
})

products = pd.DataFrame({
    'product_code': ['P1', 'P2'],
    'name': ['Laptop', 'Mouse'],
    'price': [1000, 25]
})

# Add product details to orders
result = pd.merge(orders, products, on='product_code')
result['total'] = result['quantity'] * result['price']
print(result)

Example 3: Employee Data in Separate Files (Use Join)

code.py

# Employee ID is the index
personal = pd.DataFrame({
    'name': ['John', 'Sarah'],
    'age': [30, 25]
}, index=['E001', 'E002'])

work = pd.DataFrame({
    'department': ['Sales', 'IT'],
    'salary': [50000, 60000]
}, index=['E001', 'E002'])

employee_data = personal.join(work)
print(employee_data)

Decision Flowchart

Do you want to...

Stack similar data?
  YES → Use concat()

Look up values in another table?
  YES → Use merge()

Match by row labels (index)?
  YES → Use join()

Side by Side Comparison

code.py

import pandas as pd

# Sample data
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# CONCAT - Stack vertically
concat_result = pd.concat([df1, df2], ignore_index=True)
print("Concat (stack rows):")
print(concat_result)

Output:

code.py

# MERGE - Match by column
left = pd.DataFrame({'key': [1, 2], 'value': ['a', 'b']})
right = pd.DataFrame({'key': [1, 2], 'value2': ['x', 'y']})

merge_result = pd.merge(left, right, on='key')
print("\nMerge (match by column):")
print(merge_result)

Output:

   key value value2
0    1     a      x
1    2     b      y

code.py

# JOIN - Match by index
left = pd.DataFrame({'A': [1, 2]}, index=['x', 'y'])
right = pd.DataFrame({'B': [3, 4]}, index=['x', 'y'])

join_result = left.join(right)
print("\nJoin (match by index):")
print(join_result)

Output:

   A  B
x  1  3
y  2  4

Key Points

Concat: Stack data together (same structure)
Merge: Connect data by matching column values
Join: Connect data by matching index values

Quick Reference

code.py

# Concat
pd.concat([df1, df2])                    # Stack rows
pd.concat([df1, df2], axis=1)            # Add columns

# Merge
pd.merge(df1, df2, on='column')          # Match by column
pd.merge(df1, df2, how='left')           # Keep all left rows

# Join
df1.join(df2)                            # Match by index
df1.join(df2, how='outer')               # Keep all rows

What's Next?

You now understand how to combine DataFrames. Next, you'll learn GroupBy - how to group and summarize data.