#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
6 min read min read

Window Functions

Learn to calculate moving averages, rankings, and cumulative sums

Window Functions

What are Window Functions?

Window functions calculate values based on a group of rows (a "window").

Examples:

  • Moving average of last 3 days
  • Running total of sales
  • Rank employees by salary

Rolling - Moving Window

Calculate based on last N rows:

code.py
import pandas as pd

# Daily sales
sales = pd.DataFrame({
    'Day': [1, 2, 3, 4, 5, 6, 7],
    'Sales': [100, 120, 90, 150, 130, 140, 160]
})

# 3-day moving average
sales['Moving_Avg'] = sales['Sales'].rolling(window=3).mean()
print(sales)

Output:

Day Sales Moving_Avg 0 1 100 NaN 1 2 120 NaN 2 3 90 103.333333 3 4 150 120.000000 4 5 130 123.333333 5 6 140 140.000000 6 7 160 143.333333

First 2 rows are NaN because we need 3 values to calculate.

Rolling with Different Functions

code.py
# 3-day moving sum
sales['Moving_Sum'] = sales['Sales'].rolling(3).sum()

# 3-day moving max
sales['Moving_Max'] = sales['Sales'].rolling(3).max()

# 3-day moving min
sales['Moving_Min'] = sales['Sales'].rolling(3).min()

Fill First Values

code.py
# Use min_periods to start calculating earlier
sales['Moving_Avg'] = sales['Sales'].rolling(window=3, min_periods=1).mean()
print(sales)

Output:

Day Sales Moving_Avg 0 1 100 100.000000 1 2 120 110.000000 2 3 90 103.333333 3 4 150 120.000000 ...

Now it starts from the first row!

Cumulative - Running Total

code.py
sales = pd.DataFrame({
    'Day': [1, 2, 3, 4, 5],
    'Sales': [100, 120, 90, 150, 130]
})

# Running total
sales['Running_Total'] = sales['Sales'].cumsum()

# Running maximum
sales['Running_Max'] = sales['Sales'].cummax()

# Running minimum
sales['Running_Min'] = sales['Sales'].cummin()

print(sales)

Output:

Day Sales Running_Total Running_Max Running_Min 0 1 100 100 100 100 1 2 120 220 120 100 2 3 90 310 120 90 3 4 150 460 150 90 4 5 130 590 150 90

Rank - Order by Value

code.py
employees = pd.DataFrame({
    'Name': ['John', 'Sarah', 'Mike', 'Emma'],
    'Salary': [50000, 70000, 55000, 70000]
})

# Rank by salary (1 = lowest)
employees['Rank'] = employees['Salary'].rank()
print(employees)

Output:

Name Salary Rank 0 John 50000 1.0 1 Sarah 70000 3.5 2 Mike 55000 2.0 3 Emma 70000 3.5

Sarah and Emma tie at 3.5 (average of 3 and 4).

Different Ranking Methods

code.py
# Highest salary = rank 1
employees['Rank_Desc'] = employees['Salary'].rank(ascending=False)

# Ties get same rank, next rank is skipped
employees['Rank_Min'] = employees['Salary'].rank(method='min')

# Ties get sequential ranks
employees['Rank_First'] = employees['Salary'].rank(method='first')

Shift - Previous or Next Value

code.py
sales = pd.DataFrame({
    'Day': [1, 2, 3, 4, 5],
    'Sales': [100, 120, 90, 150, 130]
})

# Previous day's sales
sales['Prev_Sales'] = sales['Sales'].shift(1)

# Next day's sales
sales['Next_Sales'] = sales['Sales'].shift(-1)

print(sales)

Output:

Day Sales Prev_Sales Next_Sales 0 1 100 NaN 120.0 1 2 120 100.0 90.0 2 3 90 120.0 150.0 3 4 150 90.0 130.0 4 5 130 150.0 NaN

Calculate Change from Previous

code.py
sales['Change'] = sales['Sales'] - sales['Sales'].shift(1)
sales['Pct_Change'] = sales['Sales'].pct_change() * 100
print(sales)

Output:

Day Sales Change Pct_Change 0 1 100 NaN NaN 1 2 120 20.0 20.000000 2 3 90 -30.0 -25.000000 3 4 150 60.0 66.666667 4 5 130 -20.0 -13.333333

Expanding - All Previous Values

code.py
sales = pd.DataFrame({
    'Day': [1, 2, 3, 4, 5],
    'Sales': [100, 120, 90, 150, 130]
})

# Average of all values up to this point
sales['Expanding_Avg'] = sales['Sales'].expanding().mean()
print(sales)

Output:

Day Sales Expanding_Avg 0 1 100 100.0 1 2 120 110.0 2 3 90 103.3 3 4 150 115.0 4 5 130 118.0

Practice Example

code.py
import pandas as pd

# Stock prices
stock = pd.DataFrame({
    'Date': pd.date_range('2024-01-01', periods=7),
    'Price': [100, 102, 98, 105, 103, 108, 110]
})

print("Stock Data:")
print(stock)

# 3-day moving average
stock['MA_3'] = stock['Price'].rolling(3).mean()

# Daily change
stock['Change'] = stock['Price'].diff()

# Percent change
stock['Pct_Change'] = stock['Price'].pct_change() * 100

# Running maximum (all-time high)
stock['All_Time_High'] = stock['Price'].cummax()

print("\nWith calculations:")
print(stock.round(2))

Key Points

  • rolling(n): Calculations on last n rows
  • cumsum(): Running total
  • rank(): Order values by size
  • shift(n): Move values up or down
  • pct_change(): Percentage change from previous
  • expanding(): All values up to current row

Common Functions

MethodWhat It Does
rolling(3).mean()3-row moving average
rolling(3).sum()3-row moving sum
cumsum()Running total
cummax()Running maximum
rank()Rank by value
shift(1)Previous row value
pct_change()% change from previous

Common Mistakes

Mistake 1: Forgetting NaN at start of rolling

code.py
# First 2 values are NaN with window=3
df['MA'] = df['Value'].rolling(3).mean()

# Use min_periods to calculate with fewer values
df['MA'] = df['Value'].rolling(3, min_periods=1).mean()

Mistake 2: Wrong shift direction

code.py
shift(1)   # Previous value (shifts down)
shift(-1)  # Next value (shifts up)

What's Next?

Congratulations! You've completed Pandas DataFrames II. Next module covers Data Cleaning - handling missing data and text.

SkillsetMaster - AI, Web Development & Data Analytics Courses