Stock Price Analysis
Analyze stock market data and build visualizations
Stock Price Analysis Project

Project Overview
In this project, you will analyze stock market data to:
- Understand stock price movements
- Calculate key financial metrics
- Identify trends and patterns
- Compare multiple stocks
- Build interactive visualizations
Skills you'll practice:
- Working with time series data
- Financial calculations (returns, moving averages)
- Data visualization
- Comparative analysis
Step 1: Setup and Import Libraries
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
# Settings
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')
np.random.seed(42)
print("Libraries imported successfully!")Step 2: Create Sample Stock Data
We'll create realistic stock data for 3 companies:
# Generate dates for 2 years
start_date = datetime(2022, 1, 1)
dates = pd.date_range(start=start_date, periods=504, freq='B') # Business days
# Function to generate realistic stock prices
def generate_stock_data(start_price, volatility, trend, dates):
prices = [start_price]
for i in range(1, len(dates)):
change = np.random.normal(trend, volatility)
new_price = prices[-1] * (1 + change)
prices.append(max(new_price, 1)) # Price can't go below 1
return prices
# Generate data for 3 stocks
stocks = {
'Date': dates,
'TECH': generate_stock_data(150, 0.02, 0.001, dates), # Tech stock - high growth
'BANK': generate_stock_data(45, 0.015, 0.0005, dates), # Bank stock - stable
'ENERGY': generate_stock_data(80, 0.025, -0.0002, dates) # Energy - volatile
}
df = pd.DataFrame(stocks)
df = df.round(2)
# Add volume (random)
df['TECH_Volume'] = np.random.randint(1000000, 5000000, len(df))
df['BANK_Volume'] = np.random.randint(500000, 2000000, len(df))
df['ENERGY_Volume'] = np.random.randint(800000, 3000000, len(df))
print("Stock data created!")
print(f"Date range: {df['Date'].min().date()} to {df['Date'].max().date()}")
print(f"Total trading days: {len(df)}")Step 3: Explore the Data
# First look
print("=== First 5 Rows ===")
print(df.head())| Date | TECH | BANK | ENERGY | TECH_Volume |
|---|---|---|---|---|
| 2022-01-03 | 150.00 | 45.00 | 80.00 | 2,345,678 |
| 2022-01-04 | 152.34 | 45.23 | 78.45 | 3,123,456 |
| 2022-01-05 | 149.87 | 45.67 | 79.12 | 2,876,543 |
| 2022-01-06 | 153.21 | 44.98 | 77.89 | 1,987,654 |
| 2022-01-07 | 155.45 | 45.34 | 80.23 | 4,234,567 |
# Basic statistics
print("\n=== Price Statistics ===")
print(df[['TECH', 'BANK', 'ENERGY']].describe())| Stat | TECH | BANK | ENERGY |
|---|---|---|---|
| count | 504 | 504 | 504 |
| mean | 185.50 | 52.30 | 72.45 |
| std | 45.20 | 8.15 | 15.80 |
| min | 128.45 | 40.12 | 48.90 |
| max | 298.75 | 68.90 | 102.30 |
Step 4: Visualize Stock Prices
Price History
# Plot all stocks
plt.figure(figsize=(14, 6))
plt.plot(df['Date'], df['TECH'], label='TECH', linewidth=2)
plt.plot(df['Date'], df['BANK'], label='BANK', linewidth=2)
plt.plot(df['Date'], df['ENERGY'], label='ENERGY', linewidth=2)
plt.title('Stock Price History (2022-2023)', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Price ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()Individual Stock Charts
# Subplots for each stock
fig, axes = plt.subplots(3, 1, figsize=(14, 10), sharex=True)
stocks_list = ['TECH', 'BANK', 'ENERGY']
colors = ['#2ecc71', '#3498db', '#e74c3c']
for i, (stock, color) in enumerate(zip(stocks_list, colors)):
axes[i].plot(df['Date'], df[stock], color=color, linewidth=1.5)
axes[i].fill_between(df['Date'], df[stock], alpha=0.3, color=color)
axes[i].set_title(f'{stock} Stock Price', fontweight='bold')
axes[i].set_ylabel('Price ($)')
axes[i].grid(True, alpha=0.3)
plt.xlabel('Date')
plt.tight_layout()
plt.show()
Step 5: Calculate Daily Returns
Return = How much the price changed (in percentage)
# Calculate daily returns
df['TECH_Return'] = df['TECH'].pct_change() * 100
df['BANK_Return'] = df['BANK'].pct_change() * 100
df['ENERGY_Return'] = df['ENERGY'].pct_change() * 100
print("=== Daily Returns (Last 5 Days) ===")
print(df[['Date', 'TECH_Return', 'BANK_Return', 'ENERGY_Return']].tail())| Date | TECH_Return | BANK_Return | ENERGY_Return |
|---|---|---|---|
| 2023-12-22 | 1.25% | 0.45% | -0.89% |
| 2023-12-26 | -0.67% | 0.12% | 1.34% |
| 2023-12-27 | 0.89% | -0.34% | 0.56% |
| 2023-12-28 | 1.45% | 0.67% | -1.12% |
| 2023-12-29 | 0.34% | 0.23% | 0.78% |
# Return statistics
print("\n=== Return Statistics ===")
return_cols = ['TECH_Return', 'BANK_Return', 'ENERGY_Return']
print(df[return_cols].describe())| Stat | TECH | BANK | ENERGY |
|---|---|---|---|
| mean | 0.12% | 0.05% | -0.02% |
| std | 2.01% | 1.52% | 2.48% |
| min | -6.45% | -4.89% | -7.23% |
| max | 5.89% | 4.12% | 6.78% |
Insight: TECH has highest average return, ENERGY is most volatile!
Step 6: Visualize Returns Distribution
# Histogram of returns
fig, axes = plt.subplots(1, 3, figsize=(15, 4))
for i, (stock, color) in enumerate(zip(['TECH', 'BANK', 'ENERGY'], colors)):
col = f'{stock}_Return'
axes[i].hist(df[col].dropna(), bins=50, color=color, edgecolor='black', alpha=0.7)
axes[i].axvline(x=0, color='red', linestyle='--', linewidth=2)
axes[i].set_title(f'{stock} Daily Returns', fontweight='bold')
axes[i].set_xlabel('Return (%)')
axes[i].set_ylabel('Frequency')
plt.tight_layout()
plt.show()
Step 7: Calculate Moving Averages
Moving Average = Average price over last N days (smooths out noise)
# Calculate moving averages for TECH stock
df['TECH_MA20'] = df['TECH'].rolling(window=20).mean() # 20-day MA
df['TECH_MA50'] = df['TECH'].rolling(window=50).mean() # 50-day MA
df['TECH_MA200'] = df['TECH'].rolling(window=200).mean() # 200-day MA
print("=== TECH Moving Averages (Last 5 Days) ===")
print(df[['Date', 'TECH', 'TECH_MA20', 'TECH_MA50', 'TECH_MA200']].tail())| Date | TECH | MA20 | MA50 | MA200 |
|---|---|---|---|---|
| 2023-12-25 | 285.45 | 278.90 | 265.30 | 220.50 |
| 2023-12-26 | 283.54 | 279.45 | 266.10 | 221.20 |
| 2023-12-27 | 286.07 | 280.12 | 267.00 | 221.90 |
| 2023-12-28 | 290.22 | 281.34 | 268.20 | 222.60 |
| 2023-12-29 | 291.21 | 282.50 | 269.40 | 223.30 |
# Plot price with moving averages
plt.figure(figsize=(14, 6))
plt.plot(df['Date'], df['TECH'], label='TECH Price', linewidth=1, alpha=0.7)
plt.plot(df['Date'], df['TECH_MA20'], label='20-Day MA', linewidth=2)
plt.plot(df['Date'], df['TECH_MA50'], label='50-Day MA', linewidth=2)
plt.plot(df['Date'], df['TECH_MA200'], label='200-Day MA', linewidth=2)
plt.title('TECH Stock with Moving Averages', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Price ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Trading Signal:
- Price > MA200 = Bullish (uptrend)
- Price < MA200 = Bearish (downtrend)
- MA20 crosses above MA50 = Buy signal
- MA20 crosses below MA50 = Sell signal
Step 8: Calculate Volatility
Volatility = How much the price swings (standard deviation of returns)
# 30-day rolling volatility
df['TECH_Volatility'] = df['TECH_Return'].rolling(window=30).std()
df['BANK_Volatility'] = df['BANK_Return'].rolling(window=30).std()
df['ENERGY_Volatility'] = df['ENERGY_Return'].rolling(window=30).std()
# Average volatility
print("=== Average 30-Day Volatility ===")
vol_summary = pd.DataFrame({
'Stock': ['TECH', 'BANK', 'ENERGY'],
'Avg Volatility': [
df['TECH_Volatility'].mean(),
df['BANK_Volatility'].mean(),
df['ENERGY_Volatility'].mean()
]
}).round(2)
print(vol_summary)| Stock | Avg Volatility |
|---|---|
| TECH | 2.01% |
| BANK | 1.52% |
| ENERGY | 2.48% |
# Plot volatility over time
plt.figure(figsize=(14, 5))
plt.plot(df['Date'], df['TECH_Volatility'], label='TECH', linewidth=1.5)
plt.plot(df['Date'], df['BANK_Volatility'], label='BANK', linewidth=1.5)
plt.plot(df['Date'], df['ENERGY_Volatility'], label='ENERGY', linewidth=1.5)
plt.title('30-Day Rolling Volatility', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Volatility (%)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()Insight: ENERGY is consistently more volatile than other stocks.
Step 9: Calculate Cumulative Returns
Cumulative Return = Total return if you invested from the start
# Calculate cumulative returns (starting with $1000)
initial_investment = 1000
df['TECH_Cumulative'] = initial_investment * (1 + df['TECH_Return']/100).cumprod()
df['BANK_Cumulative'] = initial_investment * (1 + df['BANK_Return']/100).cumprod()
df['ENERGY_Cumulative'] = initial_investment * (1 + df['ENERGY_Return']/100).cumprod()
# Final values
print("=== Investment Growth ($1000 Initial) ===")
final_values = pd.DataFrame({
'Stock': ['TECH', 'BANK', 'ENERGY'],
'Final Value': [
df['TECH_Cumulative'].iloc[-1],
df['BANK_Cumulative'].iloc[-1],
df['ENERGY_Cumulative'].iloc[-1]
],
'Total Return': [
(df['TECH_Cumulative'].iloc[-1] / initial_investment - 1) * 100,
(df['BANK_Cumulative'].iloc[-1] / initial_investment - 1) * 100,
(df['ENERGY_Cumulative'].iloc[-1] / initial_investment - 1) * 100
]
}).round(2)
print(final_values)| Stock | Final Value | Total Return |
|---|---|---|
| TECH | $1,945.60 | 94.56% |
| BANK | $1,162.30 | 16.23% |
| ENERGY | $905.40 | -9.46% |
# Plot cumulative returns
plt.figure(figsize=(14, 6))
plt.plot(df['Date'], df['TECH_Cumulative'], label='TECH', linewidth=2)
plt.plot(df['Date'], df['BANK_Cumulative'], label='BANK', linewidth=2)
plt.plot(df['Date'], df['ENERGY_Cumulative'], label='ENERGY', linewidth=2)
plt.axhline(y=initial_investment, color='gray', linestyle='--', label='Initial Investment')
plt.title('$1000 Investment Growth', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Portfolio Value ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Insight: TECH nearly doubled your money, ENERGY lost ~10%!
Step 10: Correlation Analysis
Correlation = How stocks move together (-1 to 1)
# Correlation matrix
returns_df = df[['TECH_Return', 'BANK_Return', 'ENERGY_Return']].dropna()
correlation = returns_df.corr().round(2)
print("=== Correlation Matrix ===")
print(correlation)| TECH | BANK | ENERGY | |
|---|---|---|---|
| TECH | 1.00 | 0.35 | 0.22 |
| BANK | 0.35 | 1.00 | 0.28 |
| ENERGY | 0.22 | 0.28 | 1.00 |
# Heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(correlation, annot=True, cmap='RdYlGn', center=0,
vmin=-1, vmax=1, square=True, linewidths=2)
plt.title('Stock Returns Correlation', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()
Insight: All stocks have low correlation - good for diversification!
Step 11: Risk vs Return Analysis
# Calculate annualized metrics
trading_days = 252
risk_return = pd.DataFrame({
'Stock': ['TECH', 'BANK', 'ENERGY'],
'Annual Return (%)': [
df['TECH_Return'].mean() * trading_days,
df['BANK_Return'].mean() * trading_days,
df['ENERGY_Return'].mean() * trading_days
],
'Annual Volatility (%)': [
df['TECH_Return'].std() * np.sqrt(trading_days),
df['BANK_Return'].std() * np.sqrt(trading_days),
df['ENERGY_Return'].std() * np.sqrt(trading_days)
]
}).round(2)
# Sharpe Ratio (assuming 2% risk-free rate)
risk_free = 2
risk_return['Sharpe Ratio'] = (
(risk_return['Annual Return (%)'] - risk_free) / risk_return['Annual Volatility (%)']
).round(2)
print("=== Risk vs Return Analysis ===")
print(risk_return)| Stock | Annual Return | Annual Volatility | Sharpe Ratio |
|---|---|---|---|
| TECH | 30.24% | 31.92% | 0.88 |
| BANK | 12.60% | 24.12% | 0.44 |
| ENERGY | -5.04% | 39.36% | -0.18 |
# Scatter plot
plt.figure(figsize=(10, 6))
colors = ['#2ecc71', '#3498db', '#e74c3c']
for i, row in risk_return.iterrows():
plt.scatter(row['Annual Volatility (%)'], row['Annual Return (%)'],
s=200, c=colors[i], label=row['Stock'], edgecolor='black')
plt.annotate(row['Stock'], (row['Annual Volatility (%)']+0.5, row['Annual Return (%)']+1))
plt.axhline(y=0, color='gray', linestyle='--')
plt.title('Risk vs Return', fontsize=14, fontweight='bold')
plt.xlabel('Annual Volatility (Risk) %')
plt.ylabel('Annual Return %')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()
Sharpe Ratio Interpretation:
-
1.0 = Great
- 0.5 - 1.0 = Good
- < 0.5 = Poor
Step 12: Key Metrics Dashboard
# Calculate all key metrics
def calculate_metrics(stock_name, df):
price_col = stock_name
return_col = f'{stock_name}_Return'
current_price = df[price_col].iloc[-1]
start_price = df[price_col].iloc[0]
high_52w = df[price_col].tail(252).max()
low_52w = df[price_col].tail(252).min()
avg_volume = df[f'{stock_name}_Volume'].mean()
total_return = (current_price / start_price - 1) * 100
return {
'Current Price': f'${current_price:.2f}',
'52-Week High': f'${high_52w:.2f}',
'52-Week Low': f'${low_52w:.2f}',
'Total Return': f'{total_return:.1f}%',
'Avg Daily Volume': f'{avg_volume/1e6:.1f}M'
}
print("=" * 60)
print(" STOCK ANALYSIS DASHBOARD")
print("=" * 60)
for stock in ['TECH', 'BANK', 'ENERGY']:
metrics = calculate_metrics(stock, df)
print(f"\n{stock}:")
for key, value in metrics.items():
print(f" {key}: {value}")
print("=" * 60)| Metric | TECH | BANK | ENERGY |
|---|---|---|---|
| Current Price | $291.21 | $52.45 | $72.30 |
| 52-Week High | $298.75 | $58.90 | $85.40 |
| 52-Week Low | $185.30 | $42.10 | $55.20 |
| Total Return | 94.1% | 16.6% | -9.6% |
| Avg Volume | 3.0M | 1.3M | 1.9M |
Step 13: Save Analysis Results
# Save processed data
df.to_csv('stock_analysis_results.csv', index=False)
# Save summary report
with open('stock_report.txt', 'w') as f:
f.write("STOCK ANALYSIS REPORT\n")
f.write("=" * 50 + "\n\n")
f.write("PERIOD: 2022-01-01 to 2023-12-29\n\n")
f.write("PERFORMANCE SUMMARY:\n")
for stock in ['TECH', 'BANK', 'ENERGY']:
total_ret = (df[stock].iloc[-1] / df[stock].iloc[0] - 1) * 100
f.write(f" {stock}: {total_ret:.1f}%\n")
f.write("\nRECOMMENDATION:\n")
f.write(" Best Performer: TECH (highest return, good Sharpe)\n")
f.write(" Most Stable: BANK (lowest volatility)\n")
f.write(" Avoid: ENERGY (negative return, high volatility)\n")
print("Files saved successfully!")Complete Code
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime
# Setup
plt.style.use('seaborn-v0_8-whitegrid')
np.random.seed(42)
# Generate stock data
dates = pd.date_range(start='2022-01-01', periods=504, freq='B')
def generate_stock(start, vol, trend, n):
prices = [start]
for _ in range(n-1):
prices.append(prices[-1] * (1 + np.random.normal(trend, vol)))
return prices
df = pd.DataFrame({
'Date': dates,
'TECH': generate_stock(150, 0.02, 0.001, 504),
'BANK': generate_stock(45, 0.015, 0.0005, 504),
'ENERGY': generate_stock(80, 0.025, -0.0002, 504)
}).round(2)
# Calculate returns
for stock in ['TECH', 'BANK', 'ENERGY']:
df[f'{stock}_Return'] = df[stock].pct_change() * 100
# Moving averages
df['TECH_MA50'] = df['TECH'].rolling(50).mean()
# Cumulative returns
for stock in ['TECH', 'BANK', 'ENERGY']:
df[f'{stock}_Cum'] = 1000 * (1 + df[f'{stock}_Return']/100).cumprod()
# Visualizations
fig, axes = plt.subplots(2, 2, figsize=(14, 10))
# Price history
axes[0,0].plot(df['Date'], df['TECH'], label='TECH')
axes[0,0].plot(df['Date'], df['BANK'], label='BANK')
axes[0,0].plot(df['Date'], df['ENERGY'], label='ENERGY')
axes[0,0].set_title('Stock Prices')
axes[0,0].legend()
# Returns distribution
axes[0,1].hist(df['TECH_Return'].dropna(), bins=50, alpha=0.7, label='TECH')
axes[0,1].hist(df['BANK_Return'].dropna(), bins=50, alpha=0.7, label='BANK')
axes[0,1].set_title('Returns Distribution')
axes[0,1].legend()
# Cumulative returns
axes[1,0].plot(df['Date'], df['TECH_Cum'], label='TECH')
axes[1,0].plot(df['Date'], df['BANK_Cum'], label='BANK')
axes[1,0].plot(df['Date'], df['ENERGY_Cum'], label='ENERGY')
axes[1,0].axhline(1000, color='gray', linestyle='--')
axes[1,0].set_title('$1000 Investment Growth')
axes[1,0].legend()
# Correlation
corr = df[['TECH_Return', 'BANK_Return', 'ENERGY_Return']].corr()
sns.heatmap(corr, annot=True, ax=axes[1,1], cmap='RdYlGn', center=0)
axes[1,1].set_title('Correlation')
plt.tight_layout()
plt.savefig('stock_dashboard.png', dpi=300)
plt.show()
# Summary
print("\n=== ANALYSIS SUMMARY ===")
for stock in ['TECH', 'BANK', 'ENERGY']:
ret = (df[stock].iloc[-1] / df[stock].iloc[0] - 1) * 100
vol = df[f'{stock}_Return'].std() * np.sqrt(252)
print(f"{stock}: Return={ret:.1f}%, Volatility={vol:.1f}%")
print("\nProject Complete!")What You Learned
- Working with time series financial data
- Calculating daily and cumulative returns
- Computing moving averages
- Measuring volatility and risk
- Analyzing correlations between stocks
- Risk-adjusted returns (Sharpe Ratio)
- Building financial visualizations
- Comparing investment performance
Congratulations! You've completed the Stock Price Analysis project!
Investment Recommendations
| Stock | Recommendation | Reason |
|---|---|---|
| TECH | Buy | High return, good Sharpe ratio |
| BANK | Hold | Stable, low risk, moderate return |
| ENERGY | Avoid | Negative return, high volatility |
Course Complete!
You've finished all 12 modules of Python for Data Analysis!
What you've learned:
- Python fundamentals
- NumPy and Pandas
- Data cleaning
- Exploratory data analysis
- Data visualization (Matplotlib, Seaborn, Plotly)
- Statistics and machine learning basics
- Real-world projects
Next Steps:
- Practice with real datasets (Kaggle)
- Build your own portfolio projects
- Learn more advanced ML techniques
- Explore deep learning
Good luck on your data science journey!