5 min read min read
String Methods and Operations
Learn to split, extract, and search text in pandas
String Methods and Operations
Split Text into Parts
code.py
import pandas as pd
df = pd.DataFrame({
'Full_Name': ['John Doe', 'Sarah Smith', 'Mike Johnson']
})
# Split by space
df[['First', 'Last']] = df['Full_Name'].str.split(' ', expand=True)
print(df)Output:
Full_Name First Last
0 John Doe John Doe
1 Sarah Smith Sarah Smith
2 Mike Johnson Mike Johnson
Get Part of Split
code.py
# Get only first name
df['First'] = df['Full_Name'].str.split(' ').str[0]
# Get only last name
df['Last'] = df['Full_Name'].str.split(' ').str[1]Extract Part of Text
Get specific characters:
code.py
df = pd.DataFrame({
'Code': ['ABC-123', 'DEF-456', 'GHI-789']
})
# Get first 3 characters
df['Letters'] = df['Code'].str[:3]
# Get last 3 characters
df['Numbers'] = df['Code'].str[-3:]
print(df)Output:
Code Letters Numbers
0 ABC-123 ABC 123
1 DEF-456 DEF 456
2 GHI-789 GHI 789
Check Start and End
code.py
df = pd.DataFrame({
'Email': ['john@gmail.com', 'sarah@yahoo.com', 'test@gmail.com']
})
# Check if ends with gmail.com
df['Is_Gmail'] = df['Email'].str.endswith('gmail.com')
# Check if starts with 'john'
df['Is_John'] = df['Email'].str.startswith('john')
print(df)Output:
Email Is_Gmail Is_John
0 john@gmail.com True True
1 sarah@yahoo.com False False
2 test@gmail.com True False
Find Position of Text
code.py
df = pd.DataFrame({
'Text': ['Hello World', 'Python is great', 'Data Science']
})
# Find where 'o' first appears (starts from 0)
df['Position'] = df['Text'].str.find('o')
print(df)Output:
Text Position
0 Hello World 4
1 Python is great 4
2 Data Science -1 <- not found
-1 means not found.
Count How Many Times Text Appears
code.py
df = pd.DataFrame({
'Text': ['banana', 'apple', 'pineapple']
})
# Count letter 'a'
df['A_Count'] = df['Text'].str.count('a')
print(df)Output:
Text A_Count
0 banana 3
1 apple 1
2 pineapple 1
Pad Text with Zeros
Good for codes that need fixed length:
code.py
df = pd.DataFrame({
'ID': ['1', '23', '456']
})
# Make all IDs 5 digits with leading zeros
df['ID_Padded'] = df['ID'].str.zfill(5)
print(df)Output:
ID ID_Padded
0 1 00001
1 23 00023
2 456 00456
Quick Reference
| Method | What It Does |
|---|---|
| .str.split('x') | Split text by 'x' |
| .str[:3] | First 3 characters |
| .str[-3:] | Last 3 characters |
| .str.startswith('x') | Starts with 'x'? |
| .str.endswith('x') | Ends with 'x'? |
| .str.find('x') | Position of 'x' |
| .str.count('x') | How many 'x'? |
| .str.zfill(5) | Pad with zeros |
Key Points
- split() breaks text into parts
- str[start:end] extracts part of text
- startswith() and endswith() check text
- find() returns position (-1 if not found)
- count() counts occurrences
What's Next?
For complex text patterns, learn Regular Expressions - a powerful search language.