12 min read
Pandas I: DataFrames
Mastering the DataFrame: Creation, inspection, selection, and filtering
What You'll Learn
- Creating and inspecting DataFrames
- Selecting columns and rows (loc/iloc)
- Conditional filtering
- Sorting and ranking
- Basic DataFrame attributes
The DataFrame Object
A DataFrame is a 2-dimensional labeled data structure with columns of potentially different types. Think of it like a super-powered Excel sheet or SQL table.
code.py
import pandas as pd
# Creating a DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Paris', 'London']
}
df = pd.DataFrame(data)Inspecting Data
code.py
# View first/last rows
df.head(5)
df.tail(5)
# Summary info (types, non-null counts)
df.info()
# Statistical summary (mean, min, max, etc.)
df.describe()
# Dimensions
df.shape # (rows, columns)
df.columns # Column names
df.index # Row labelsSelection and Indexing
Selecting Columns:
code.py
# Single column (returns Series)
ages = df['Age']
# Multiple columns (returns DataFrame)
subset = df[['Name', 'City']]Selecting Rows (loc vs iloc):
.loc: Label-based.iloc: Integer position-based
code.py
# Select row by index label
row = df.loc[0]
# Select specific rows and columns by label
val = df.loc[0, 'Name'] # 'Alice'
subset = df.loc[0:2, ['Name', 'Age']]
# Select by position (integer index)
row = df.iloc[0] # First row
subset = df.iloc[0:2, 0:2] # First 2 rows, first 2 colsFiltering Data
code.py
# Simple condition
adults = df[df['Age'] >= 18]
# Multiple conditions (& for AND, | for OR)
# Note: Parentheses are mandatory!
target = df[(df['Age'] > 25) & (df['City'] == 'Paris')]
# Isin (like SQL IN)
cities = df[df['City'].isin(['New York', 'London'])]
# String filtering
j_names = df[df['Name'].str.startswith('J')]Sorting and Ranking
code.py
# Sort by values
df_sorted = df.sort_values(by='Age', ascending=False)
# Sort by multiple columns
df_sorted = df.sort_values(by=['City', 'Age'])
# Sort by index
df_sorted = df.sort_index()Modifying Data
code.py
# Adding a new column
df['Age_Next_Year'] = df['Age'] + 1
# Conditional assignment
import numpy as np
df['Status'] = np.where(df['Age'] >= 18, 'Adult', 'Minor')
# Dropping columns
df = df.drop(columns=['Age_Next_Year'])Practice Exercise
code.py
import pandas as pd
# Load sample data (or create it)
df = pd.DataFrame({
'product': ['Laptop', 'Mouse', 'Monitor', 'Keyboard'],
'price': [1000, 25, 200, 50],
'stock': [10, 100, 20, 50]
})
# 1. Select products with price > 100
expensive = df[df['price'] > 100]
# 2. Calculate total value (price * stock)
df['total_value'] = df['price'] * df['stock']
# 3. Sort by total value descending
df_sorted = df.sort_values('total_value', ascending=False)
print(df_sorted)Next Steps
Now that you can manipulate a single DataFrame, let's learn how to combine multiple datasets!
Practice & Experiment
Test your understanding by running Python code directly in your browser. Try the examples from the article above!