5 min read min read
Join Operations
Learn to combine DataFrames using their index with join
Join Operations
What is Join?
Join is like merge, but it uses the index (row labels) to connect DataFrames.
Think of it like matching books by their library code number.
Basic Join Example
code.py
import pandas as pd
# Create DataFrames with named index
students = pd.DataFrame({
'name': ['John', 'Sarah', 'Mike']
}, index=['S1', 'S2', 'S3'])
scores = pd.DataFrame({
'marks': [85, 92, 78]
}, index=['S1', 'S2', 'S3'])
print("Students:")
print(students)
print("\nScores:")
print(scores)Output:
Students:
name
S1 John
S2 Sarah
S3 Mike
Scores:
marks
S1 85
S2 92
S3 78
Now join them:
code.py
result = students.join(scores)
print(result)Output:
name marks
S1 John 85
S2 Sarah 92
S3 Mike 78
Joined using the index (S1, S2, S3).
Types of Joins
Left Join (Default)
Keep all rows from the left DataFrame.
code.py
students = pd.DataFrame({
'name': ['John', 'Sarah', 'Mike', 'Emma']
}, index=['S1', 'S2', 'S3', 'S4'])
scores = pd.DataFrame({
'marks': [85, 92]
}, index=['S1', 'S2'])
result = students.join(scores, how='left')
print(result)Output:
name marks
S1 John 85.0
S2 Sarah 92.0
S3 Mike NaN
S4 Emma NaN
All students kept. Missing scores are NaN.
Right Join
Keep all rows from the right DataFrame.
code.py
result = students.join(scores, how='right')
print(result)Output:
name marks
S1 John 85
S2 Sarah 92
Inner Join
Keep only matching rows.
code.py
result = students.join(scores, how='inner')
print(result)Output:
name marks
S1 John 85
S2 Sarah 92
Outer Join
Keep all rows from both DataFrames.
code.py
students = pd.DataFrame({
'name': ['John', 'Sarah']
}, index=['S1', 'S2'])
scores = pd.DataFrame({
'marks': [85, 78]
}, index=['S1', 'S3'])
result = students.join(scores, how='outer')
print(result)Output:
name marks
S1 John 85.0
S2 Sarah NaN
S3 NaN 78.0
Join Multiple DataFrames
code.py
names = pd.DataFrame({'name': ['John', 'Sarah']}, index=['S1', 'S2'])
ages = pd.DataFrame({'age': [25, 30]}, index=['S1', 'S2'])
cities = pd.DataFrame({'city': ['NYC', 'LA']}, index=['S1', 'S2'])
# Join all three
result = names.join([ages, cities])
print(result)Output:
name age city
S1 John 25 NYC
S2 Sarah 30 LA
Join with Column Name Conflicts
When both DataFrames have same column name:
code.py
df1 = pd.DataFrame({'score': [85, 92]}, index=['S1', 'S2'])
df2 = pd.DataFrame({'score': [90, 88]}, index=['S1', 'S2'])
result = df1.join(df2, lsuffix='_math', rsuffix='_english')
print(result)Output:
score_math score_english
S1 85 90
S2 92 88
lsuffix = left suffix, rsuffix = right suffix
Join on a Column Instead of Index
code.py
students = pd.DataFrame({
'student_id': ['S1', 'S2', 'S3'],
'name': ['John', 'Sarah', 'Mike']
})
scores = pd.DataFrame({
'marks': [85, 92, 78]
}, index=['S1', 'S2', 'S3'])
# Set student_id as index, then join
result = students.set_index('student_id').join(scores)
print(result)Output:
name marks
student_id
S1 John 85
S2 Sarah 92
S3 Mike 78
Practice Example
code.py
import pandas as pd
# Product info
products = pd.DataFrame({
'name': ['Laptop', 'Mouse', 'Keyboard'],
'category': ['Electronics', 'Electronics', 'Electronics']
}, index=['P1', 'P2', 'P3'])
# Product prices
prices = pd.DataFrame({
'price': [1000, 25, 75]
}, index=['P1', 'P2', 'P3'])
# Product stock
stock = pd.DataFrame({
'quantity': [50, 200, 100]
}, index=['P1', 'P2', 'P3'])
# Join all together
result = products.join([prices, stock])
print(result)
# Calculate total value
result['total_value'] = result['price'] * result['quantity']
print("\nWith total value:")
print(result)Key Points
df1.join(df2)joins using the index- Default is left join (keeps all from left)
- Use
how='inner','outer','right'for other types - Use
lsuffixandrsuffixfor same column names - Join multiple DataFrames with a list:
df1.join([df2, df3])
Join vs Merge
| Feature | Join | Merge |
|---|---|---|
| Joins on | Index (default) | Column |
| Syntax | df1.join(df2) | pd.merge(df1, df2) |
| Speed | Faster for index | Better for columns |
Common Mistakes
Mistake 1: Forgetting to set index
code.py
# If you want to join on 'id' column
# First set it as index
df1 = df1.set_index('id')
df2 = df2.set_index('id')
result = df1.join(df2)Mistake 2: Same column names without suffix
code.py
# This will error if both have 'score' column
df1.join(df2)
# Add suffixes
df1.join(df2, lsuffix='_left', rsuffix='_right')What's Next?
You learned concat, merge, and join. Next, you'll learn when to use each one.