Simple Linear Regression
Build predictive models with linear regression
What You'll Learn
- What linear regression is
- Understanding the regression line
- Interpreting slope and intercept
- R-squared and model fit
- Making predictions
Linear Regression Basics

Purpose: Model relationship between two variables and make predictions
Goal: Find the best-fit line through data points
Equation: Y = β₀ + β₁X + ε
Where:
- Y = dependent variable (what we predict)
- X = independent variable (predictor)
- β₀ = intercept (Y when X=0)
- β₁ = slope (change in Y per unit of X)
- ε = error term
The Regression Line
What it does: Minimizes total squared errors (vertical distances from points to line)
Method: Ordinary Least Squares (OLS)
Result: Best-fit line: ŷ = b₀ + b₁x
Example: Sales = 1000 + 50 × (Ad Spend)
- Intercept: $1000 baseline sales
- Slope: Each $1 in ads increases sales by $50
Interpreting the Slope
Slope (b₁): Change in Y for one-unit increase in X
Examples:
Positive slope: Sales = 100 + 5(Price) "Each $1 price increase → $5 more revenue"
Negative slope: Demand = 1000 - 20(Price) "Each $1 price increase → 20 fewer units sold"
Zero slope: No relationship between X and Y
Interpreting the Intercept
Intercept (b₀): Predicted Y when X = 0
Example: Test Score = 50 + 10(Study Hours)
- Intercept: 50 points with zero study
- Realistic? Maybe not! (extrapolation issue)
Warning: Only meaningful if X=0 makes sense in your context
R-Squared (R²)

What it measures: How much variance in Y is explained by X
Range: 0 to 1 (or 0% to 100%)
Interpretation:
- R² = 0.80: "80% of variance explained"
- R² = 0.30: "30% of variance explained"
Guidelines:
- R² > 0.7: Strong relationship
- R² = 0.3-0.7: Moderate
- R² < 0.3: Weak
Important: High R² doesn't mean causation!
Residuals
What they are: Actual Y - Predicted Y
Why important: Show how well model fits
Good residuals:
- Randomly scattered
- No pattern
- Normally distributed
Bad residuals:
- Curved pattern (nonlinear relationship!)
- Increasing spread (heteroscedasticity)
- Outliers
Making Predictions
Process:
- Fit regression: ŷ = 50 + 2x
- Plug in X value: x = 10
- Calculate: ŷ = 50 + 2(10) = 70
Example: Height = 60 + 2.5(Age) Predict height at age 10: Height = 60 + 2.5(10) = 85 inches
Caution: Don't extrapolate beyond data range!
Excel Implementation
Steps:
- Plot scatter chart
- Add trendline
- Display equation and R²
Formulas:
- Slope: =SLOPE(Y_range, X_range)
- Intercept: =INTERCEPT(Y_range, X_range)
- R²: =RSQ(Y_range, X_range)
- Predict: =FORECAST(new_x, Y_range, X_range)
Analysis ToolPak: Data → Data Analysis → Regression
Python Implementation
from sklearn.linear_model import LinearRegression
import numpy as np
# Data
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2, 4, 5, 4, 5])
# Fit model
model = LinearRegression()
model.fit(X, y)
# Get coefficients
print(f"Intercept: {model.intercept_}")
print(f"Slope: {model.coef_[0]}")
print(f"R-squared: {model.score(X, y)}")
# Predict
new_value = model.predict([[6]])
Real-World Applications
Marketing: Sales vs advertising spend
Finance: Stock returns vs market returns (beta)
HR: Salary vs years of experience
Real Estate: House price vs square footage
Healthcare: Blood pressure vs age
Example: Advertising ROI
Data: Ad Spend ($): 100, 200, 300, 400, 500 Sales ($): 500, 900, 1200, 1600, 1900
Regression: Sales = 150 + 3.5(Ad Spend) R² = 0.98
Interpretation:
- $150 baseline sales
- Each $1 in ads → $3.50 in sales
- Very strong fit (98% explained)
Decision: ROI = $3.50 - $1 = $2.50 profit per ad dollar → Keep advertising!
Correlation vs Regression
Correlation (r):
- Measures strength of relationship
- No prediction
- Symmetric (r(X,Y) = r(Y,X))
Regression:
- Predicts Y from X
- Has equation
- Asymmetric (different if you swap X and Y)
Relationship: r² = R² (in simple linear regression)
Common Mistakes
1. Assuming causation Correlation ≠ Causation!
2. Extrapolating Don't predict outside data range
3. Ignoring residuals Check assumptions!
4. Using when nonlinear Curved relationship? Use different model
5. Ignoring outliers One point can change entire line
Practice Exercise
Data: Years Experience: 1, 2, 3, 4, 5 Salary ($1000s): 40, 45, 55, 60, 70
Tasks:
- Calculate slope and intercept
- Interpret the slope
- Predict salary at 6 years
- Calculate R²
Answers:
- Salary = 32 + 7.5(Years)
- Each year → $7,500 increase
- Salary = 32 + 7.5(6) = $77k
- R² ≈ 0.95 (strong fit)
Next Steps
Learn about Model Assumptions!
Tip: Always plot your data before fitting regression!