5 min read min read
Merge vs Join vs Concat
Learn when to use merge, join, or concat to combine DataFrames
Merge vs Join vs Concat
Quick Summary
| Method | Use When | How It Works |
|---|---|---|
| concat | Stacking data | Adds rows or columns |
| merge | Matching by column values | Like Excel VLOOKUP |
| join | Matching by index | Fast index matching |
Concat - Stacking Data
Use concat when: You want to stack similar data together.
code.py
import pandas as pd
# Monthly sales - same columns
jan = pd.DataFrame({'Product': ['A', 'B'], 'Sales': [100, 150]})
feb = pd.DataFrame({'Product': ['A', 'B'], 'Sales': [120, 180]})
# Stack them (add rows)
all_sales = pd.concat([jan, feb], ignore_index=True)
print(all_sales)Output:
Product Sales
0 A 100
1 B 150
2 A 120
3 B 180
Think: "I want to combine similar tables"
Merge - Match by Column Values
Use merge when: You need to connect data using a shared column.
code.py
# Different information, same IDs
customers = pd.DataFrame({
'customer_id': [1, 2, 3],
'name': ['Alice', 'Bob', 'Charlie']
})
orders = pd.DataFrame({
'customer_id': [1, 1, 2],
'amount': [50, 75, 100]
})
# Connect using customer_id
result = pd.merge(customers, orders, on='customer_id')
print(result)Output:
customer_id name amount
0 1 Alice 50
1 1 Alice 75
2 2 Bob 100
Think: "I want to look up matching values"
Join - Match by Index
Use join when: Your data has meaningful index values.
code.py
# Data with index as identifier
names = pd.DataFrame({'name': ['John', 'Sarah']}, index=['E001', 'E002'])
salaries = pd.DataFrame({'salary': [50000, 60000]}, index=['E001', 'E002'])
result = names.join(salaries)
print(result)Output:
name salary
E001 John 50000
E002 Sarah 60000
Think: "My row labels are important identifiers"
Real World Examples
Example 1: Combining Survey Responses (Use Concat)
code.py
# Survey responses from different days
day1 = pd.DataFrame({
'response': ['Good', 'Bad', 'Good'],
'score': [5, 2, 4]
})
day2 = pd.DataFrame({
'response': ['Good', 'Okay'],
'score': [5, 3]
})
all_responses = pd.concat([day1, day2], ignore_index=True)
print(all_responses)Example 2: Adding Product Details to Orders (Use Merge)
code.py
orders = pd.DataFrame({
'order_id': [1, 2, 3],
'product_code': ['P1', 'P2', 'P1'],
'quantity': [2, 1, 3]
})
products = pd.DataFrame({
'product_code': ['P1', 'P2'],
'name': ['Laptop', 'Mouse'],
'price': [1000, 25]
})
# Add product details to orders
result = pd.merge(orders, products, on='product_code')
result['total'] = result['quantity'] * result['price']
print(result)Example 3: Employee Data in Separate Files (Use Join)
code.py
# Employee ID is the index
personal = pd.DataFrame({
'name': ['John', 'Sarah'],
'age': [30, 25]
}, index=['E001', 'E002'])
work = pd.DataFrame({
'department': ['Sales', 'IT'],
'salary': [50000, 60000]
}, index=['E001', 'E002'])
employee_data = personal.join(work)
print(employee_data)Decision Flowchart
Do you want to...
Stack similar data?
YES → Use concat()
Look up values in another table?
YES → Use merge()
Match by row labels (index)?
YES → Use join()
Side by Side Comparison
code.py
import pandas as pd
# Sample data
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})
# CONCAT - Stack vertically
concat_result = pd.concat([df1, df2], ignore_index=True)
print("Concat (stack rows):")
print(concat_result)Output:
A B
0 1 3
1 2 4
2 5 7
3 6 8
code.py
# MERGE - Match by column
left = pd.DataFrame({'key': [1, 2], 'value': ['a', 'b']})
right = pd.DataFrame({'key': [1, 2], 'value2': ['x', 'y']})
merge_result = pd.merge(left, right, on='key')
print("\nMerge (match by column):")
print(merge_result)Output:
key value value2
0 1 a x
1 2 b y
code.py
# JOIN - Match by index
left = pd.DataFrame({'A': [1, 2]}, index=['x', 'y'])
right = pd.DataFrame({'B': [3, 4]}, index=['x', 'y'])
join_result = left.join(right)
print("\nJoin (match by index):")
print(join_result)Output:
A B
x 1 3
y 2 4
Key Points
- Concat: Stack data together (same structure)
- Merge: Connect data by matching column values
- Join: Connect data by matching index values
Quick Reference
code.py
# Concat
pd.concat([df1, df2]) # Stack rows
pd.concat([df1, df2], axis=1) # Add columns
# Merge
pd.merge(df1, df2, on='column') # Match by column
pd.merge(df1, df2, how='left') # Keep all left rows
# Join
df1.join(df2) # Match by index
df1.join(df2, how='outer') # Keep all rowsWhat's Next?
You now understand how to combine DataFrames. Next, you'll learn GroupBy - how to group and summarize data.