#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
5 min read min read

Data Type Conversions

Learn to convert between different data types in pandas

Data Type Conversions

Why Convert Types?

Sometimes data comes in wrong format:

  • Numbers stored as text: "100" instead of 100
  • Dates stored as text: "2024-01-15" instead of date
  • Categories stored as text: takes more memory

Check Current Data Types

code.py
import pandas as pd

df = pd.DataFrame({
    'Price': ['100', '200', '150'],
    'Quantity': [5, 10, 8],
    'Date': ['2024-01-01', '2024-01-02', '2024-01-03']
})

print(df.dtypes)

Output:

Price object <- text (should be number!) Quantity int64 <- number (good) Date object <- text (should be date!)

Convert Text to Number

code.py
# Convert Price from text to number
df['Price'] = pd.to_numeric(df['Price'])
print(df.dtypes)

Now Price is int64 (number).

Handle Bad Data in Conversion

code.py
df = pd.DataFrame({
    'Price': ['100', '200', 'unknown', '150']
})

# This will error because 'unknown' can't be a number
# df['Price'] = pd.to_numeric(df['Price'])

# Use errors='coerce' to make bad values NaN
df['Price'] = pd.to_numeric(df['Price'], errors='coerce')
print(df)

Output:

Price 0 100.0 1 200.0 2 NaN <- 'unknown' became NaN 3 150.0

Convert to Text (String)

code.py
df = pd.DataFrame({
    'ID': [1, 2, 3]
})

df['ID'] = df['ID'].astype(str)
print(df.dtypes)

ID is now object (text).

Convert Text to Date

code.py
df = pd.DataFrame({
    'Date': ['2024-01-15', '2024-02-20', '2024-03-10']
})

df['Date'] = pd.to_datetime(df['Date'])
print(df.dtypes)
print(df)

Output:

Date datetime64[ns] Date 0 2024-01-15 1 2024-02-20 2 2024-03-10

Convert to Category

Categories use less memory for repeated text.

code.py
df = pd.DataFrame({
    'Status': ['Active', 'Inactive', 'Active', 'Active', 'Inactive']
})

print("Before:", df['Status'].memory_usage())

df['Status'] = df['Status'].astype('category')

print("After:", df['Status'].memory_usage())

Uses less memory after conversion!

Common Conversions

FromToMethod
Text → Numberpd.to_numeric()
Text → Datepd.to_datetime()
Any → Text.astype(str)
Any → Integer.astype(int)
Any → Float.astype(float)
Text → Category.astype('category')

Convert Multiple Columns

code.py
df = df.astype({
    'Price': float,
    'Quantity': int,
    'Status': 'category'
})

Key Points

  • df.dtypes shows all column types
  • pd.to_numeric() converts to number
  • pd.to_datetime() converts to date
  • .astype() converts to any type
  • errors='coerce' turns bad values to NaN
  • Categories save memory for repeated text

What's Next?

Learn to clean and work with text data using string methods.