6 min read min read
Window Functions
Learn to calculate moving averages, rankings, and cumulative sums
Window Functions
What are Window Functions?
Window functions calculate values based on a group of rows (a "window").
Examples:
- Moving average of last 3 days
- Running total of sales
- Rank employees by salary
Rolling - Moving Window
Calculate based on last N rows:
code.py
import pandas as pd
# Daily sales
sales = pd.DataFrame({
'Day': [1, 2, 3, 4, 5, 6, 7],
'Sales': [100, 120, 90, 150, 130, 140, 160]
})
# 3-day moving average
sales['Moving_Avg'] = sales['Sales'].rolling(window=3).mean()
print(sales)Output:
Day Sales Moving_Avg
0 1 100 NaN
1 2 120 NaN
2 3 90 103.333333
3 4 150 120.000000
4 5 130 123.333333
5 6 140 140.000000
6 7 160 143.333333
First 2 rows are NaN because we need 3 values to calculate.
Rolling with Different Functions
code.py
# 3-day moving sum
sales['Moving_Sum'] = sales['Sales'].rolling(3).sum()
# 3-day moving max
sales['Moving_Max'] = sales['Sales'].rolling(3).max()
# 3-day moving min
sales['Moving_Min'] = sales['Sales'].rolling(3).min()Fill First Values
code.py
# Use min_periods to start calculating earlier
sales['Moving_Avg'] = sales['Sales'].rolling(window=3, min_periods=1).mean()
print(sales)Output:
Day Sales Moving_Avg
0 1 100 100.000000
1 2 120 110.000000
2 3 90 103.333333
3 4 150 120.000000
...
Now it starts from the first row!
Cumulative - Running Total
code.py
sales = pd.DataFrame({
'Day': [1, 2, 3, 4, 5],
'Sales': [100, 120, 90, 150, 130]
})
# Running total
sales['Running_Total'] = sales['Sales'].cumsum()
# Running maximum
sales['Running_Max'] = sales['Sales'].cummax()
# Running minimum
sales['Running_Min'] = sales['Sales'].cummin()
print(sales)Output:
Day Sales Running_Total Running_Max Running_Min
0 1 100 100 100 100
1 2 120 220 120 100
2 3 90 310 120 90
3 4 150 460 150 90
4 5 130 590 150 90
Rank - Order by Value
code.py
employees = pd.DataFrame({
'Name': ['John', 'Sarah', 'Mike', 'Emma'],
'Salary': [50000, 70000, 55000, 70000]
})
# Rank by salary (1 = lowest)
employees['Rank'] = employees['Salary'].rank()
print(employees)Output:
Name Salary Rank
0 John 50000 1.0
1 Sarah 70000 3.5
2 Mike 55000 2.0
3 Emma 70000 3.5
Sarah and Emma tie at 3.5 (average of 3 and 4).
Different Ranking Methods
code.py
# Highest salary = rank 1
employees['Rank_Desc'] = employees['Salary'].rank(ascending=False)
# Ties get same rank, next rank is skipped
employees['Rank_Min'] = employees['Salary'].rank(method='min')
# Ties get sequential ranks
employees['Rank_First'] = employees['Salary'].rank(method='first')Shift - Previous or Next Value
code.py
sales = pd.DataFrame({
'Day': [1, 2, 3, 4, 5],
'Sales': [100, 120, 90, 150, 130]
})
# Previous day's sales
sales['Prev_Sales'] = sales['Sales'].shift(1)
# Next day's sales
sales['Next_Sales'] = sales['Sales'].shift(-1)
print(sales)Output:
Day Sales Prev_Sales Next_Sales
0 1 100 NaN 120.0
1 2 120 100.0 90.0
2 3 90 120.0 150.0
3 4 150 90.0 130.0
4 5 130 150.0 NaN
Calculate Change from Previous
code.py
sales['Change'] = sales['Sales'] - sales['Sales'].shift(1)
sales['Pct_Change'] = sales['Sales'].pct_change() * 100
print(sales)Output:
Day Sales Change Pct_Change
0 1 100 NaN NaN
1 2 120 20.0 20.000000
2 3 90 -30.0 -25.000000
3 4 150 60.0 66.666667
4 5 130 -20.0 -13.333333
Expanding - All Previous Values
code.py
sales = pd.DataFrame({
'Day': [1, 2, 3, 4, 5],
'Sales': [100, 120, 90, 150, 130]
})
# Average of all values up to this point
sales['Expanding_Avg'] = sales['Sales'].expanding().mean()
print(sales)Output:
Day Sales Expanding_Avg
0 1 100 100.0
1 2 120 110.0
2 3 90 103.3
3 4 150 115.0
4 5 130 118.0
Practice Example
code.py
import pandas as pd
# Stock prices
stock = pd.DataFrame({
'Date': pd.date_range('2024-01-01', periods=7),
'Price': [100, 102, 98, 105, 103, 108, 110]
})
print("Stock Data:")
print(stock)
# 3-day moving average
stock['MA_3'] = stock['Price'].rolling(3).mean()
# Daily change
stock['Change'] = stock['Price'].diff()
# Percent change
stock['Pct_Change'] = stock['Price'].pct_change() * 100
# Running maximum (all-time high)
stock['All_Time_High'] = stock['Price'].cummax()
print("\nWith calculations:")
print(stock.round(2))Key Points
- rolling(n): Calculations on last n rows
- cumsum(): Running total
- rank(): Order values by size
- shift(n): Move values up or down
- pct_change(): Percentage change from previous
- expanding(): All values up to current row
Common Functions
| Method | What It Does |
|---|---|
| rolling(3).mean() | 3-row moving average |
| rolling(3).sum() | 3-row moving sum |
| cumsum() | Running total |
| cummax() | Running maximum |
| rank() | Rank by value |
| shift(1) | Previous row value |
| pct_change() | % change from previous |
Common Mistakes
Mistake 1: Forgetting NaN at start of rolling
code.py
# First 2 values are NaN with window=3
df['MA'] = df['Value'].rolling(3).mean()
# Use min_periods to calculate with fewer values
df['MA'] = df['Value'].rolling(3, min_periods=1).mean()Mistake 2: Wrong shift direction
code.py
shift(1) # Previous value (shifts down)
shift(-1) # Next value (shifts up)What's Next?
Congratulations! You've completed Pandas DataFrames II. Next module covers Data Cleaning - handling missing data and text.