#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
5 min read

Adding and Removing Columns

Learn to add new columns and remove existing ones from DataFrames

Adding and Removing Columns

Adding Single Column

code.pyPython
import pandas as pd

df = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike'],
    'Salary': [50000, 60000, 55000]
})

df['Department'] = 'Sales'
print(df)

Output:

Name Salary Department 0 John 50000 Sales 1 Sarah 60000 Sales 2 Mike 55000 Sales

All rows get same value.

Adding from List

code.pyPython
df['Age'] = [25, 30, 28]
print(df)

List must match number of rows!

Adding from Calculation

code.pyPython
df['Bonus'] = df['Salary'] * 0.1
print(df)

Creates new column from existing column.

Adding from Multiple Columns

code.pyPython
df['Total'] = df['Salary'] + df['Bonus']
print(df)

Adding with Conditions

code.pyPython
import numpy as np

df['Level'] = np.where(df['Salary'] > 55000, 'Senior', 'Junior')
print(df)

Adding with apply()

code.pyPython
def calculate_tax(salary):
    return salary * 0.2

df['Tax'] = df['Salary'].apply(calculate_tax)
print(df)

Using lambda:

code.pyPython
df['Tax'] = df['Salary'].apply(lambda x: x * 0.2)

Adding Multiple Columns

code.pyPython
df[['Bonus', 'Tax']] = df['Salary'] * 0.1, df['Salary'] * 0.2
print(df)

Or separately:

code.pyPython
df['Bonus'] = df['Salary'] * 0.1
df['Tax'] = df['Salary'] * 0.2

Insert at Specific Position

code.pyPython
df.insert(1, 'ID', [101, 102, 103])
print(df)

Inserts ID column at position 1 (after first column).

Removing Single Column

code.pyPython
df_new = df.drop('Age', axis=1)
print(df_new)

Original df unchanged.

Remove permanently:

code.pyPython
df.drop('Age', axis=1, inplace=True)

Removing Multiple Columns

code.pyPython
df_new = df.drop(['Age', 'Bonus'], axis=1)
print(df_new)

Delete with del

code.pyPython
del df['Tax']
print(df)

Modifies DataFrame immediately!

Pop Column

Remove and return column.

code.pyPython
bonus_column = df.pop('Bonus')
print("Bonus column:", bonus_column)
print("DataFrame now:", df)

Column removed from df.

Practice Example

The scenario: Build employee database with calculated columns.

code.pyPython
import pandas as pd
import numpy as np

employees = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike', 'Emma', 'David'],
    'Base_Salary': [50000, 65000, 55000, 70000, 60000],
    'Years': [3, 7, 4, 9, 5]
})

print("Initial data:")
print(employees)
print()

print("1. Add Department:")
employees['Department'] = ['Sales', 'IT', 'Sales', 'HR', 'IT']
print(employees)
print()

print("2. Add Employee ID at start:")
employees.insert(0, 'ID', range(101, 106))
print(employees)
print()

print("3. Calculate bonus (10% of base):")
employees['Bonus'] = employees['Base_Salary'] * 0.1
print(employees)
print()

print("4. Calculate tax (20% of base):")
employees['Tax'] = employees['Base_Salary'] * 0.2
print(employees)
print()

print("5. Add experience level:")
employees['Level'] = np.where(
    employees['Years'] >= 7, 'Senior',
    np.where(employees['Years'] >= 4, 'Mid', 'Junior')
)
print(employees)
print()

print("6. Calculate total compensation:")
employees['Total_Comp'] = employees['Base_Salary'] + employees['Bonus']
print(employees)
print()

print("7. Add performance multiplier:")
def get_multiplier(row):
    if row['Level'] == 'Senior':
        return 1.5
    elif row['Level'] == 'Mid':
        return 1.2
    else:
        return 1.0

employees['Multiplier'] = employees.apply(get_multiplier, axis=1)
print(employees)
print()

print("8. Remove Tax column:")
employees = employees.drop('Tax', axis=1)
print(employees)
print()

print("Final summary:")
print("Columns:", employees.columns.tolist())
print("Shape:", employees.shape)
print("Total compensation:", employees['Total_Comp'].sum())

Adding Empty Column

code.pyPython
df['Notes'] = None
print(df)

Or with NaN:

code.pyPython
df['Comments'] = np.nan

Adding from Series

code.pyPython
new_data = pd.Series([100, 200, 300])
df['Values'] = new_data

Conditional Column Addition

code.pyPython
if 'Bonus' not in df.columns:
    df['Bonus'] = 0

Adding with assign()

Creates copy with new column.

code.pyPython
df_new = df.assign(
    Bonus=df['Salary'] * 0.1,
    Tax=df['Salary'] * 0.2
)
print(df_new)

Original df unchanged.

Chain Multiple Operations

code.pyPython
df_result = (df
    .assign(Bonus=df['Salary'] * 0.1)
    .assign(Tax=df['Salary'] * 0.2)
    .assign(Net=lambda x: x['Salary'] - x['Tax'])
)

Removing Columns by Pattern

code.pyPython
cols_to_drop = [col for col in df.columns if 'temp' in col]
df = df.drop(cols_to_drop, axis=1)

Keep Only Specific Columns

code.pyPython
df = df[['Name', 'Salary', 'Age']]

Drops all other columns.

Reorder Columns

code.pyPython
df = df[['ID', 'Name', 'Age', 'Salary']]

Add Prefix to Columns

code.pyPython
df = df.add_prefix('emp_')
print(df.columns.tolist())

Output: ['emp_Name', 'emp_Salary', 'emp_Age']

Add Suffix to Columns

code.pyPython
df = df.add_suffix('_2024')
print(df.columns.tolist())

Copy Column

code.pyPython
df['Salary_Backup'] = df['Salary']

Replace Column

code.pyPython
df['Salary'] = df['Salary'] * 1.1

Overwrites existing column.

Key Points to Remember

Add column with df['NewCol'] = values. Simple and direct.

Remove column with drop('Col', axis=1). Use inplace=True to modify original.

del df['Col'] removes immediately without creating copy.

insert(position, name, values) adds column at specific position.

assign() creates new DataFrame with added columns. Original unchanged.

List length must match number of rows when adding from list.

Common Mistakes

Mistake 1: Wrong list length

code.pyPython
df['Age'] = [25, 30]  # Error if df has 3 rows!
# Check: len(df) must equal len([25, 30])

Mistake 2: Forgetting axis

code.pyPython
df.drop('Age')  # Error!
df.drop('Age', axis=1)  # Correct

Mistake 3: Not assigning result

code.pyPython
df.drop('Age', axis=1)  # Doesn't change df!
df = df.drop('Age', axis=1)  # Correct
# OR
df.drop('Age', axis=1, inplace=True)

Mistake 4: Using del on filtered DataFrame

code.pyPython
subset = df[df['Age'] > 25]
del subset['Name']  # May affect original!

subset = df[df['Age'] > 25].copy()  # Safe

Mistake 5: Column name typo

code.pyPython
df.drop('Sallary', axis=1)  # Error if column is 'Salary'
print(df.columns.tolist())  # Check names first

What's Next?

You now know how to add and remove columns. Next, you'll learn about renaming columns - changing column names to better ones.