Excel Statistical Analysis
Perform statistical analysis using Excel
What You'll Learn
- Excel Data Analysis ToolPak
- Descriptive statistics
- Regression analysis
- Hypothesis testing in Excel
- Charts and visualization
- Practical workflows
Data Analysis ToolPak
Most important Excel add-in for statistics!
Activating:
- File → Options → Add-ins
- Manage: Excel Add-ins → Go
- Check "Analysis ToolPak"
- Click OK
Once activated: Data tab → Data Analysis button appears
What it includes:
- Descriptive Statistics
- Regression
- t-Tests
- ANOVA
- Correlation
- Histogram
- Moving Average
- Exponential Smoothing
- And more!
Descriptive Statistics
Tool: Data → Data Analysis → Descriptive Statistics
Outputs:
- Mean
- Median
- Mode
- Standard Deviation
- Variance
- Range
- Minimum/Maximum
- Count
- Skewness
- Kurtosis
- Confidence Level
Steps:
- Select your data range
- Choose output location
- Check "Summary statistics"
- Optional: Confidence Level for Mean
Example: Sales data in A1:A100 Output: Complete statistical summary
Built-in Statistical Functions
Measures of Center:
=AVERAGE(A1:A100) Mean
=MEDIAN(A1:A100) Median
=MODE.SNGL(A1:A100) Mode
Measures of Spread:
=STDEV.S(A1:A100) Sample standard deviation
=STDEV.P(A1:A100) Population standard deviation
=VAR.S(A1:A100) Sample variance
=VAR.P(A1:A100) Population variance
Percentiles:
=QUARTILE.INC(A1:A100, 1) Q1
=QUARTILE.INC(A1:A100, 2) Q2 (Median)
=QUARTILE.INC(A1:A100, 3) Q3
=PERCENTILE.INC(A1:A100, 0.95) 95th percentile
Other useful:
=COUNT(A1:A100) Count of numbers
=COUNTA(A1:A100) Count of non-empty cells
=MIN(A1:A100) Minimum
=MAX(A1:A100) Maximum
Correlation Analysis
Tool: Data → Data Analysis → Correlation
Use: Find relationships between multiple variables
Steps:
- Arrange data in columns (variables in columns)
- Data Analysis → Correlation
- Select input range
- Choose output location
Output: Correlation matrix
Example: Columns: Sales, Advertising, Price, Temperature Output: 4×4 correlation matrix
Interpretation:
- Values from -1 to +1
- Close to ±1: Strong relationship
- Close to 0: Weak relationship
Regression Analysis
Tool: Data → Data Analysis → Regression
Simple Linear Regression:
- Y variable: Dependent (e.g., Sales)
- X variable: Independent (e.g., Advertising)
- Select ranges
- Choose output location
- Optional: Residual plots, Normal probability plot
Output includes:
- R-squared
- Adjusted R-squared
- Coefficients
- Standard errors
- t-statistics
- p-values
- ANOVA table
- Residuals
Multiple Regression: Same steps, but X range includes multiple columns
Example: Y: Sales (B1:B100) X: Advertising, Price (C1:D100)
Interpreting Regression Output
Regression Statistics:
- Multiple R: Correlation coefficient
- R Square: Proportion of variance explained
- Adjusted R Square: Adjusted for predictors
- Standard Error: Average prediction error
- Observations: Sample size
ANOVA table:
- F-statistic: Overall model significance
- Significance F: p-value for F-test
Coefficients:
- Intercept: Y when all X = 0
- Slope(s): Effect of each predictor
- p-value: Test if coefficient ≠0
- Lower/Upper 95%: Confidence interval
T-Tests
Three types available:
1. Paired t-test: Data → Data Analysis → t-Test: Paired Two Sample for Means
Use: Before/after comparisons (same subjects)
2. Two-sample equal variance: Data → Data Analysis → t-Test: Two-Sample Assuming Equal Variances
Use: Compare two independent groups
3. Two-sample unequal variance: Data → Data Analysis → t-Test: Two-Sample Assuming Unequal Variances
Use: When variances differ (safer choice)
Manual t-test:
=T.TEST(array1, array2, tails, type)
tails: 1 (one-tailed) or 2 (two-tailed)
type: 1 (paired), 2 (equal variance), 3 (unequal variance)
ANOVA
Tool: Data → Data Analysis → ANOVA: Single Factor
Use: Compare means of 3+ groups
Steps:
- Arrange groups in columns
- Select input range
- Choose grouped by: Columns
- Set alpha (usually 0.05)
Output:
- Summary statistics per group
- ANOVA table
- F-statistic
- p-value
- F critical value
Decision: If p < 0.05: At least one group differs
Histogram
Tool: Data → Data Analysis → Histogram
Creates frequency distribution:
Steps:
- Select data range
- Optional: Bin range (custom intervals)
- Choose output location
- Check "Chart Output" for visual
Alternative (modern): Insert → Charts → Insert Statistic Chart → Histogram
Customization:
- Right-click bars → Format Data Series
- Adjust bin width
- Set number of bins
Charts for Statistics
Scatter Plot (XY Chart): Best for: Correlation, regression visualization Insert → Charts → Scatter
Box Plot: Insert → Insert Statistic Chart → Box and Whisker Shows: Quartiles, median, outliers
Line Chart: Best for: Time series Insert → Charts → Line
Column/Bar Chart: Best for: Comparing groups Insert → Charts → Column
Tips:
- Add trendline: Right-click data → Add Trendline
- Display equation: Trendline Options → Display Equation
- Show R²: Trendline Options → Display R-squared
Moving Average
Tool: Data → Data Analysis → Moving Average
Parameters:
- Input Range: Your time series data
- Interval: Number of periods (e.g., 3, 12)
- Chart Output: Visual of smoothed data
Manual formula:
=AVERAGE(B2:B4) (for 3-period MA)
Copy down
Use:
- Smooth time series
- Identify trends
- Remove noise
Exponential Smoothing
Tool: Data → Data Analysis → Exponential Smoothing
Parameters:
- Damping factor: 1 - α (If α = 0.3, damping = 0.7)
Manual formula:
=0.3*B2 + 0.7*C1
Where B2 = actual, C1 = previous forecast
Forecast next period: Last smoothed value
Confidence Intervals
For mean:
=CONFIDENCE.T(alpha, stdev, size)
Example:
Mean: =AVERAGE(A1:A100)
CI: =CONFIDENCE.T(0.05, STDEV.S(A1:A100), COUNT(A1:A100))
Lower: =Mean - CI
Upper: =Mean + CI
Interpretation: 95% confidence interval for population mean
Pivot Tables for Statistics
Create: Insert → PivotTable
Use cases:
- Summary statistics by group
- Frequency tables
- Cross-tabulations
Setup:
- Rows: Categorical variable
- Values: Metric to summarize
- Value Field Settings → Average, Count, StdDev, etc.
Example: Analyze sales by region:
- Rows: Region
- Values: Average of Sales, Count of Sales, StdDev of Sales
Data Cleaning
Remove duplicates: Data → Remove Duplicates
Find/Replace: Ctrl+H
Text to Columns: Data → Text to Columns Split data by delimiter
TRIM function:
=TRIM(A1) Removes extra spaces
Filter: Data → Filter Click dropdown arrows to filter
Sort: Data → Sort Multiple level sorting
Conditional Formatting
Highlight outliers: Home → Conditional Formatting → Highlight Cells Rules
Data bars: Visual representation of values Home → Conditional Formatting → Data Bars
Color scales: Gradient coloring Home → Conditional Formatting → Color Scales
Custom rules: Home → Conditional Formatting → New Rule Formula-based conditions
Practical Workflow: Sales Analysis
Step 1: Import data Data → Get Data (Power Query)
Step 2: Clean data
- Remove duplicates
- Handle missing values
- Format dates
Step 3: Descriptive stats Data Analysis → Descriptive Statistics
Step 4: Visualize
- Histogram of sales distribution
- Time series line chart
- Scatter plot: Sales vs Advertising
Step 5: Correlation Check relationships between variables
Step 6: Regression Predict sales from advertising spend
Step 7: Forecast
- Moving average for trend
- Seasonal analysis if monthly
Practical Workflow: A/B Test
Step 1: Setup data Columns: Group (A/B), Metric
Step 2: Summary by group PivotTable or AVERAGEIF
Step 3: t-Test Data Analysis → t-Test: Two-Sample
Step 4: Interpret
- Mean difference
- p-value
- Decision: Significant if p < 0.05
Step 5: Visualize Column chart comparing groups Error bars for confidence intervals
Array Formulas
Dynamic ranges:
=AVERAGE(A:A) Average entire column (ignores non-numbers)
FILTER function (Excel 365):
=AVERAGE(FILTER(B:B, A:A="North"))
Average sales where region is North
UNIQUE function:
=UNIQUE(A:A) Get unique values
SORT function:
=SORT(A1:B100, 2, -1) Sort by column 2, descending
Tips and Best Practices
1. Use tables: Ctrl+T to convert range to Table
- Auto-expanding formulas
- Structured references
- Better filtering
2. Name ranges: Formulas → Define Name Makes formulas readable
3. Document assumptions: Add comments (Shift+F2)
4. Separate data from analysis: Use different sheets
5. Version control: Save dated copies before major changes
6. Validate data: Data → Data Validation Prevent entry errors
7. Use shortcuts:
- Ctrl+Shift+L: Toggle filters
- Ctrl+T: Create table
- Alt+N+V: Insert PivotTable
- F2: Edit cell
- Ctrl+`: Show formulas
Common Mistakes
1. Mixing sample and population formulas: Use STDEV.S for sample, STDEV.P for population
2. Circular references: Formula refers to itself
3. Not checking assumptions: Before t-test, check normality and equal variance
4. Deleting source data: Breaks charts and formulas
5. Not using absolute references: Use $ when needed: $A$1
Practice Exercise
Dataset: Monthly sales (12 months) Jan-Dec: 100, 110, 105, 115, 120, 125, 130, 135, 128, 140, 145, 160
Tasks:
- Calculate descriptive statistics
- Create histogram
- Calculate 3-month moving average
- Create line chart with trend
- Forecast next month using exponential smoothing (α=0.3)
Solutions:
- Use Data Analysis → Descriptive Statistics
- Data Analysis → Histogram
- =AVERAGE(B2:B4), copy down
- Insert scatter, add linear trendline
- Build exponential smoothing, forecast = last smoothed value
Advanced Features
Power Query: Data → Get Data → From File Transform and clean data
Power Pivot: More powerful than PivotTables Handle millions of rows
Solver: Optimization problems Tools → Solver
Goal Seek: Find input to achieve target Data → What-If Analysis → Goal Seek
Next Steps
Learn about Python Statistical Analysis!
Tip: Master Excel first - it's the most widely used analytics tool in business!