Selecting Columns
Learn to select and work with specific columns in DataFrames
Selecting Columns
Single Column Selection
Bracket notation
import pandas as pd
df = pd.DataFrame({
'Name': ['John', 'Sarah', 'Mike'],
'Age': [25, 30, 28],
'City': ['NYC', 'LA', 'Chicago']
})
names = df['Name']
print(names)
print(type(names))Output:
0 John
1 Sarah
2 Mike
Name: Name, dtype: object
<class 'pandas.core.series.Series'>
Returns a Series (single column).
Dot notation
names = df.Name
print(names)Works only if:
- Column name has no spaces
- Name doesn't conflict with DataFrame methods
Use brackets when:
- Column name has spaces: df['First Name']
- Name conflicts: df['count']
Multiple Column Selection
subset = df[['Name', 'Age']]
print(subset)
print(type(subset))Output:
Name Age
0 John 25
1 Sarah 30
2 Mike 28
<class 'pandas.core.frame.DataFrame'>
Double brackets [[]] return DataFrame.
Reordering Columns
reordered = df[['City', 'Name', 'Age']]
print(reordered)Order matters!
Selecting by Position
first_col = df.iloc[:, 0]
print("First column:")
print(first_col)What this means:
:= all rows0= first column
Multiple columns by position:
first_two = df.iloc[:, 0:2]
print(first_two)Or specific positions:
cols = df.iloc[:, [0, 2]]
print(cols)Selecting by Data Type
numeric = df.select_dtypes(include='number')
print(numeric)Other types:
text = df.select_dtypes(include='object')
dates = df.select_dtypes(include='datetime')
all_numeric = df.select_dtypes(include=['int64', 'float64'])Excluding Columns
without_age = df.drop('Age', axis=1)
print(without_age)Multiple columns:
subset = df.drop(['Age', 'City'], axis=1)
print(subset)Important: Original df unchanged unless inplace=True.
df.drop('Age', axis=1, inplace=True)Columns with Patterns
df = pd.DataFrame({
'sales_2023': [100, 200],
'sales_2024': [150, 250],
'profit_2023': [20, 40],
'profit_2024': [30, 50]
})
sales_cols = [col for col in df.columns if 'sales' in col]
print("Sales columns:", sales_cols)
print(df[sales_cols])What this does: Find all columns containing 'sales'.
Filter by Column Names
cols = df.filter(like='sales')
print(cols)Or with regex:
cols = df.filter(regex='2024$')
print(cols)Practice Example
The scenario: Analyze employee data by selecting specific columns.
import pandas as pd
employees = pd.DataFrame({
'emp_id': [101, 102, 103, 104],
'first_name': ['John', 'Sarah', 'Mike', 'Emma'],
'last_name': ['Doe', 'Smith', 'Johnson', 'Williams'],
'department': ['Sales', 'IT', 'Sales', 'HR'],
'salary': [50000, 70000, 55000, 60000],
'bonus': [5000, 10000, 7000, 8000],
'years_exp': [3, 7, 4, 5]
})
print("Full data:")
print(employees.head())
print()
print("Names only:")
names = employees[['first_name', 'last_name']]
print(names)
print()
print("Financial columns:")
financial = employees[['salary', 'bonus']]
print(financial)
print()
print("Calculate total compensation:")
comp = employees[['first_name', 'salary', 'bonus']].copy()
comp['total'] = comp['salary'] + comp['bonus']
print(comp)
print()
print("Numeric columns only:")
numeric = employees.select_dtypes(include='number')
print(numeric)
print()
print("Columns starting with emp or department:")
selected = employees.filter(regex='^(emp|department)')
print(selected)Column Properties
print("Column count:", len(df.columns))
print("Column names:", df.columns.tolist())
print("Has column 'Age':", 'Age' in df.columns)Get Values as Array
names_array = df['Name'].values
print(names_array)
print(type(names_array))Returns NumPy array.
Get Values as List
names_list = df['Name'].tolist()
print(names_list)
print(type(names_list))Column from Multiple DataFrames
df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
df2 = pd.DataFrame({'C': [5, 6], 'D': [7, 8]})
combined = pd.concat([df1['A'], df2['C']], axis=1)
print(combined)Conditional Column Selection
high_salary = df[df['salary'] > 60000]['first_name']
print("High earners:", high_salary.tolist())Key Points to Remember
Single brackets df['Name'] return Series. Double brackets df[['Name']] return DataFrame.
Dot notation df.Name works only for simple column names without spaces.
iloc selects by position: df.iloc[:, 0] gets first column.
select_dtypes() filters columns by data type. Useful for numeric analysis.
drop() excludes columns. Use axis=1 for columns, axis=0 for rows.
Common Mistakes
Mistake 1: Forgetting double brackets
subset = df['Name', 'Age'] # Error!
subset = df[['Name', 'Age']] # CorrectMistake 2: Using dot notation with spaces
df.First Name # Error!
df['First Name'] # CorrectMistake 3: Modifying without copy
subset = df[['Name', 'Age']]
subset['Name'] = 'Test' # May affect original!
subset = df[['Name', 'Age']].copy() # SafeMistake 4: Wrong axis for drop
df.drop('Age') # Error! Missing axis
df.drop('Age', axis=1) # CorrectMistake 5: Assuming drop changes original
df.drop('Age', axis=1) # df still has Age!
df = df.drop('Age', axis=1) # Correct
# OR
df.drop('Age', axis=1, inplace=True) # Also correctWhat's Next?
You now know how to select columns. Next, you'll learn about selecting rows with loc - choosing specific rows by label and condition.