#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
5 min read min read

Merge vs Join vs Concat

Learn when to use merge, join, or concat to combine DataFrames

Merge vs Join vs Concat

Quick Summary

MethodUse WhenHow It Works
concatStacking dataAdds rows or columns
mergeMatching by column valuesLike Excel VLOOKUP
joinMatching by indexFast index matching

Concat - Stacking Data

Use concat when: You want to stack similar data together.

code.py
import pandas as pd

# Monthly sales - same columns
jan = pd.DataFrame({'Product': ['A', 'B'], 'Sales': [100, 150]})
feb = pd.DataFrame({'Product': ['A', 'B'], 'Sales': [120, 180]})

# Stack them (add rows)
all_sales = pd.concat([jan, feb], ignore_index=True)
print(all_sales)

Output:

Product Sales 0 A 100 1 B 150 2 A 120 3 B 180

Think: "I want to combine similar tables"

Merge - Match by Column Values

Use merge when: You need to connect data using a shared column.

code.py
# Different information, same IDs
customers = pd.DataFrame({
    'customer_id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie']
})

orders = pd.DataFrame({
    'customer_id': [1, 1, 2],
    'amount': [50, 75, 100]
})

# Connect using customer_id
result = pd.merge(customers, orders, on='customer_id')
print(result)

Output:

customer_id name amount 0 1 Alice 50 1 1 Alice 75 2 2 Bob 100

Think: "I want to look up matching values"

Join - Match by Index

Use join when: Your data has meaningful index values.

code.py
# Data with index as identifier
names = pd.DataFrame({'name': ['John', 'Sarah']}, index=['E001', 'E002'])
salaries = pd.DataFrame({'salary': [50000, 60000]}, index=['E001', 'E002'])

result = names.join(salaries)
print(result)

Output:

name salary E001 John 50000 E002 Sarah 60000

Think: "My row labels are important identifiers"

Real World Examples

Example 1: Combining Survey Responses (Use Concat)

code.py
# Survey responses from different days
day1 = pd.DataFrame({
    'response': ['Good', 'Bad', 'Good'],
    'score': [5, 2, 4]
})

day2 = pd.DataFrame({
    'response': ['Good', 'Okay'],
    'score': [5, 3]
})

all_responses = pd.concat([day1, day2], ignore_index=True)
print(all_responses)

Example 2: Adding Product Details to Orders (Use Merge)

code.py
orders = pd.DataFrame({
    'order_id': [1, 2, 3],
    'product_code': ['P1', 'P2', 'P1'],
    'quantity': [2, 1, 3]
})

products = pd.DataFrame({
    'product_code': ['P1', 'P2'],
    'name': ['Laptop', 'Mouse'],
    'price': [1000, 25]
})

# Add product details to orders
result = pd.merge(orders, products, on='product_code')
result['total'] = result['quantity'] * result['price']
print(result)

Example 3: Employee Data in Separate Files (Use Join)

code.py
# Employee ID is the index
personal = pd.DataFrame({
    'name': ['John', 'Sarah'],
    'age': [30, 25]
}, index=['E001', 'E002'])

work = pd.DataFrame({
    'department': ['Sales', 'IT'],
    'salary': [50000, 60000]
}, index=['E001', 'E002'])

employee_data = personal.join(work)
print(employee_data)

Decision Flowchart

Do you want to... Stack similar data? YES → Use concat() Look up values in another table? YES → Use merge() Match by row labels (index)? YES → Use join()

Side by Side Comparison

code.py
import pandas as pd

# Sample data
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'A': [5, 6], 'B': [7, 8]})

# CONCAT - Stack vertically
concat_result = pd.concat([df1, df2], ignore_index=True)
print("Concat (stack rows):")
print(concat_result)

Output:

A B 0 1 3 1 2 4 2 5 7 3 6 8
code.py
# MERGE - Match by column
left = pd.DataFrame({'key': [1, 2], 'value': ['a', 'b']})
right = pd.DataFrame({'key': [1, 2], 'value2': ['x', 'y']})

merge_result = pd.merge(left, right, on='key')
print("\nMerge (match by column):")
print(merge_result)

Output:

key value value2 0 1 a x 1 2 b y
code.py
# JOIN - Match by index
left = pd.DataFrame({'A': [1, 2]}, index=['x', 'y'])
right = pd.DataFrame({'B': [3, 4]}, index=['x', 'y'])

join_result = left.join(right)
print("\nJoin (match by index):")
print(join_result)

Output:

A B x 1 3 y 2 4

Key Points

  1. Concat: Stack data together (same structure)
  2. Merge: Connect data by matching column values
  3. Join: Connect data by matching index values

Quick Reference

code.py
# Concat
pd.concat([df1, df2])                    # Stack rows
pd.concat([df1, df2], axis=1)            # Add columns

# Merge
pd.merge(df1, df2, on='column')          # Match by column
pd.merge(df1, df2, how='left')           # Keep all left rows

# Join
df1.join(df2)                            # Match by index
df1.join(df2, how='outer')               # Keep all rows

What's Next?

You now understand how to combine DataFrames. Next, you'll learn GroupBy - how to group and summarize data.

SkillsetMaster - AI, Web Development & Data Analytics Courses