SkillsetMaster - AI, Web Development & Data Analytics Courses

What You'll Learn

What linear regression is
Understanding the regression line
Interpreting slope and intercept
R-squared and model fit
Making predictions

Linear Regression Basics

Linear Regression Line

Purpose: Model relationship between two variables and make predictions

Goal: Find the best-fit line through data points

Equation: Y = β₀ + β₁X + ε

Where:

Y = dependent variable (what we predict)
X = independent variable (predictor)
β₀ = intercept (Y when X=0)
β₁ = slope (change in Y per unit of X)
ε = error term

The Regression Line

What it does: Minimizes total squared errors (vertical distances from points to line)

Method: Ordinary Least Squares (OLS)

Result: Best-fit line: ŷ = b₀ + b₁x

Example: Sales = 1000 + 50 × (Ad Spend)

Intercept: $1000 baseline sales
Slope: Each $1 in ads increases sales by $50

Interpreting the Slope

Slope (b₁): Change in Y for one-unit increase in X

Examples:

Positive slope: Sales = 100 + 5(Price) "Each $1 price increase → $5 more revenue"

Negative slope: Demand = 1000 - 20(Price) "Each $1 price increase → 20 fewer units sold"

Zero slope: No relationship between X and Y

Interpreting the Intercept

Intercept (b₀): Predicted Y when X = 0

Example: Test Score = 50 + 10(Study Hours)

Intercept: 50 points with zero study
Realistic? Maybe not! (extrapolation issue)

Warning: Only meaningful if X=0 makes sense in your context

R-Squared (R²)

R-Squared Visualization

What it measures: How much variance in Y is explained by X

Range: 0 to 1 (or 0% to 100%)

Interpretation:

R² = 0.80: "80% of variance explained"
R² = 0.30: "30% of variance explained"

Guidelines:

R² > 0.7: Strong relationship
R² = 0.3-0.7: Moderate
R² < 0.3: Weak

Important: High R² doesn't mean causation!

Residuals

What they are: Actual Y - Predicted Y

Why important: Show how well model fits

Good residuals:

Randomly scattered
No pattern
Normally distributed

Bad residuals:

Curved pattern (nonlinear relationship!)
Increasing spread (heteroscedasticity)
Outliers

Making Predictions

Process:

Fit regression: ŷ = 50 + 2x
Plug in X value: x = 10
Calculate: ŷ = 50 + 2(10) = 70

Example: Height = 60 + 2.5(Age) Predict height at age 10: Height = 60 + 2.5(10) = 85 inches

Caution: Don't extrapolate beyond data range!

Excel Implementation

Steps:

Plot scatter chart
Add trendline
Display equation and R²

Formulas:

Slope: =SLOPE(Y_range, X_range)
Intercept: =INTERCEPT(Y_range, X_range)
R²: =RSQ(Y_range, X_range)
Predict: =FORECAST(new_x, Y_range, X_range)

Analysis ToolPak: Data → Data Analysis → Regression

Python Implementation

from sklearn.linear_model import LinearRegression
import numpy as np

# Data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])

# Fit model
model = LinearRegression()
model.fit(X, y)

# Get coefficients
print(f"Intercept: {model.intercept_}")
print(f"Slope: {model.coef_[0]}")
print(f"R-squared: {model.score(X, y)}")

# Predict
new_value = model.predict([[6]])

Real-World Applications

Marketing: Sales vs advertising spend

Finance: Stock returns vs market returns (beta)

HR: Salary vs years of experience

Real Estate: House price vs square footage

Healthcare: Blood pressure vs age

Example: Advertising ROI

Data: Ad Spend ($): 100, 200, 300, 400, 500 Sales ($): 500, 900, 1200, 1600, 1900

Regression: Sales = 150 + 3.5(Ad Spend) R² = 0.98

Interpretation:

$150 baseline sales
Each $1 in ads → $3.50 in sales
Very strong fit (98% explained)

Decision: ROI = $3.50 - $1 = $2.50 profit per ad dollar → Keep advertising!

Correlation vs Regression

Correlation (r):

Measures strength of relationship
No prediction
Symmetric (r(X,Y) = r(Y,X))

Regression:

Predicts Y from X
Has equation
Asymmetric (different if you swap X and Y)

Relationship: r² = R² (in simple linear regression)

Common Mistakes

1. Assuming causation Correlation ≠ Causation!

2. Extrapolating Don't predict outside data range

3. Ignoring residuals Check assumptions!

4. Using when nonlinear Curved relationship? Use different model

5. Ignoring outliers One point can change entire line

Practice Exercise

Data: Years Experience: 1, 2, 3, 4, 5 Salary ($1000s): 40, 45, 55, 60, 70

Tasks:

Calculate slope and intercept
Interpret the slope
Predict salary at 6 years
Calculate R²

Answers:

Salary = 32 + 7.5(Years)
Each year → $7,500 increase
Salary = 32 + 7.5(6) = $77k
R² ≈ 0.95 (strong fit)

Next Steps

Learn about Model Assumptions!

Tip: Always plot your data before fitting regression!