Stock Price Analysis Project

Stock Price Analysis

Project Overview

In this project, you will analyze stock market data to:

Understand stock price movements
Calculate key financial metrics
Identify trends and patterns
Compare multiple stocks
Build interactive visualizations

Skills you'll practice:

Working with time series data
Financial calculations (returns, moving averages)
Data visualization
Comparative analysis

Step 1: Setup and Import Libraries

code.py

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta

# Settings
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')
np.random.seed(42)

print("Libraries imported successfully!")

Step 2: Create Sample Stock Data

We'll create realistic stock data for 3 companies:

code.py

# Generate dates for 2 years
start_date = datetime(2022, 1, 1)
dates = pd.date_range(start=start_date, periods=504, freq='B')  # Business days

# Function to generate realistic stock prices
def generate_stock_data(start_price, volatility, trend, dates):
    prices = [start_price]
    for i in range(1, len(dates)):
        change = np.random.normal(trend, volatility)
        new_price = prices[-1] * (1 + change)
        prices.append(max(new_price, 1))  # Price can't go below 1
    return prices

# Generate data for 3 stocks
stocks = {
    'Date': dates,
    'TECH': generate_stock_data(150, 0.02, 0.001, dates),   # Tech stock - high growth
    'BANK': generate_stock_data(45, 0.015, 0.0005, dates),  # Bank stock - stable
    'ENERGY': generate_stock_data(80, 0.025, -0.0002, dates) # Energy - volatile
}

df = pd.DataFrame(stocks)
df = df.round(2)

# Add volume (random)
df['TECH_Volume'] = np.random.randint(1000000, 5000000, len(df))
df['BANK_Volume'] = np.random.randint(500000, 2000000, len(df))
df['ENERGY_Volume'] = np.random.randint(800000, 3000000, len(df))

print("Stock data created!")
print(f"Date range: {df['Date'].min().date()} to {df['Date'].max().date()}")
print(f"Total trading days: {len(df)}")

Step 3: Explore the Data

code.py

# First look
print("=== First 5 Rows ===")
print(df.head())

Date	TECH	BANK	ENERGY	TECH_Volume
2022-01-03	150.00	45.00	80.00	2,345,678
2022-01-04	152.34	45.23	78.45	3,123,456
2022-01-05	149.87	45.67	79.12	2,876,543
2022-01-06	153.21	44.98	77.89	1,987,654
2022-01-07	155.45	45.34	80.23	4,234,567

code.py

# Basic statistics
print("\n=== Price Statistics ===")
print(df[['TECH', 'BANK', 'ENERGY']].describe())

Stat	TECH	BANK	ENERGY
count	504	504	504
mean	185.50	52.30	72.45
std	45.20	8.15	15.80
min	128.45	40.12	48.90
max	298.75	68.90	102.30

Step 4: Visualize Stock Prices

Price History

code.py

# Plot all stocks
plt.figure(figsize=(14, 6))

plt.plot(df['Date'], df['TECH'], label='TECH', linewidth=2)
plt.plot(df['Date'], df['BANK'], label='BANK', linewidth=2)
plt.plot(df['Date'], df['ENERGY'], label='ENERGY', linewidth=2)

plt.title('Stock Price History (2022-2023)', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Price ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Individual Stock Charts

code.py

# Subplots for each stock
fig, axes = plt.subplots(3, 1, figsize=(14, 10), sharex=True)

stocks_list = ['TECH', 'BANK', 'ENERGY']
colors = ['#2ecc71', '#3498db', '#e74c3c']

for i, (stock, color) in enumerate(zip(stocks_list, colors)):
    axes[i].plot(df['Date'], df[stock], color=color, linewidth=1.5)
    axes[i].fill_between(df['Date'], df[stock], alpha=0.3, color=color)
    axes[i].set_title(f'{stock} Stock Price', fontweight='bold')
    axes[i].set_ylabel('Price ($)')
    axes[i].grid(True, alpha=0.3)

plt.xlabel('Date')
plt.tight_layout()
plt.show()

Stock Price History

Step 5: Calculate Daily Returns

Return = How much the price changed (in percentage)

code.py

# Calculate daily returns
df['TECH_Return'] = df['TECH'].pct_change() * 100
df['BANK_Return'] = df['BANK'].pct_change() * 100
df['ENERGY_Return'] = df['ENERGY'].pct_change() * 100

print("=== Daily Returns (Last 5 Days) ===")
print(df[['Date', 'TECH_Return', 'BANK_Return', 'ENERGY_Return']].tail())

Date	TECH_Return	BANK_Return	ENERGY_Return
2023-12-22	1.25%	0.45%	-0.89%
2023-12-26	-0.67%	0.12%	1.34%
2023-12-27	0.89%	-0.34%	0.56%
2023-12-28	1.45%	0.67%	-1.12%
2023-12-29	0.34%	0.23%	0.78%

code.py

# Return statistics
print("\n=== Return Statistics ===")
return_cols = ['TECH_Return', 'BANK_Return', 'ENERGY_Return']
print(df[return_cols].describe())

Stat	TECH	BANK	ENERGY
mean	0.12%	0.05%	-0.02%
std	2.01%	1.52%	2.48%
min	-6.45%	-4.89%	-7.23%
max	5.89%	4.12%	6.78%

Insight: TECH has highest average return, ENERGY is most volatile!

Step 6: Visualize Returns Distribution

code.py

# Histogram of returns
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for i, (stock, color) in enumerate(zip(['TECH', 'BANK', 'ENERGY'], colors)):
    col = f'{stock}_Return'
    axes[i].hist(df[col].dropna(), bins=50, color=color, edgecolor='black', alpha=0.7)
    axes[i].axvline(x=0, color='red', linestyle='--', linewidth=2)
    axes[i].set_title(f'{stock} Daily Returns', fontweight='bold')
    axes[i].set_xlabel('Return (%)')
    axes[i].set_ylabel('Frequency')

plt.tight_layout()
plt.show()

Returns Distribution

Step 7: Calculate Moving Averages

Moving Average = Average price over last N days (smooths out noise)

code.py

# Calculate moving averages for TECH stock
df['TECH_MA20'] = df['TECH'].rolling(window=20).mean()   # 20-day MA
df['TECH_MA50'] = df['TECH'].rolling(window=50).mean()   # 50-day MA
df['TECH_MA200'] = df['TECH'].rolling(window=200).mean() # 200-day MA

print("=== TECH Moving Averages (Last 5 Days) ===")
print(df[['Date', 'TECH', 'TECH_MA20', 'TECH_MA50', 'TECH_MA200']].tail())

Date	TECH	MA20	MA50	MA200
2023-12-25	285.45	278.90	265.30	220.50
2023-12-26	283.54	279.45	266.10	221.20
2023-12-27	286.07	280.12	267.00	221.90
2023-12-28	290.22	281.34	268.20	222.60
2023-12-29	291.21	282.50	269.40	223.30

code.py

# Plot price with moving averages
plt.figure(figsize=(14, 6))

plt.plot(df['Date'], df['TECH'], label='TECH Price', linewidth=1, alpha=0.7)
plt.plot(df['Date'], df['TECH_MA20'], label='20-Day MA', linewidth=2)
plt.plot(df['Date'], df['TECH_MA50'], label='50-Day MA', linewidth=2)
plt.plot(df['Date'], df['TECH_MA200'], label='200-Day MA', linewidth=2)

plt.title('TECH Stock with Moving Averages', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Price ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Moving Averages

Trading Signal:

Price > MA200 = Bullish (uptrend)
Price < MA200 = Bearish (downtrend)
MA20 crosses above MA50 = Buy signal
MA20 crosses below MA50 = Sell signal

Step 8: Calculate Volatility

Volatility = How much the price swings (standard deviation of returns)

code.py

# 30-day rolling volatility
df['TECH_Volatility'] = df['TECH_Return'].rolling(window=30).std()
df['BANK_Volatility'] = df['BANK_Return'].rolling(window=30).std()
df['ENERGY_Volatility'] = df['ENERGY_Return'].rolling(window=30).std()

# Average volatility
print("=== Average 30-Day Volatility ===")
vol_summary = pd.DataFrame({
    'Stock': ['TECH', 'BANK', 'ENERGY'],
    'Avg Volatility': [
        df['TECH_Volatility'].mean(),
        df['BANK_Volatility'].mean(),
        df['ENERGY_Volatility'].mean()
    ]
}).round(2)
print(vol_summary)

Stock	Avg Volatility
TECH	2.01%
BANK	1.52%
ENERGY	2.48%

code.py

# Plot volatility over time
plt.figure(figsize=(14, 5))

plt.plot(df['Date'], df['TECH_Volatility'], label='TECH', linewidth=1.5)
plt.plot(df['Date'], df['BANK_Volatility'], label='BANK', linewidth=1.5)
plt.plot(df['Date'], df['ENERGY_Volatility'], label='ENERGY', linewidth=1.5)

plt.title('30-Day Rolling Volatility', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Volatility (%)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Insight: ENERGY is consistently more volatile than other stocks.

Step 9: Calculate Cumulative Returns

Cumulative Return = Total return if you invested from the start

code.py

# Calculate cumulative returns (starting with $1000)
initial_investment = 1000

df['TECH_Cumulative'] = initial_investment * (1 + df['TECH_Return']/100).cumprod()
df['BANK_Cumulative'] = initial_investment * (1 + df['BANK_Return']/100).cumprod()
df['ENERGY_Cumulative'] = initial_investment * (1 + df['ENERGY_Return']/100).cumprod()

# Final values
print("=== Investment Growth ($1000 Initial) ===")
final_values = pd.DataFrame({
    'Stock': ['TECH', 'BANK', 'ENERGY'],
    'Final Value': [
        df['TECH_Cumulative'].iloc[-1],
        df['BANK_Cumulative'].iloc[-1],
        df['ENERGY_Cumulative'].iloc[-1]
    ],
    'Total Return': [
        (df['TECH_Cumulative'].iloc[-1] / initial_investment - 1) * 100,
        (df['BANK_Cumulative'].iloc[-1] / initial_investment - 1) * 100,
        (df['ENERGY_Cumulative'].iloc[-1] / initial_investment - 1) * 100
    ]
}).round(2)
print(final_values)

Stock	Final Value	Total Return
TECH	$1,945.60	94.56%
BANK	$1,162.30	16.23%
ENERGY	$905.40	-9.46%

code.py

# Plot cumulative returns
plt.figure(figsize=(14, 6))

plt.plot(df['Date'], df['TECH_Cumulative'], label='TECH', linewidth=2)
plt.plot(df['Date'], df['BANK_Cumulative'], label='BANK', linewidth=2)
plt.plot(df['Date'], df['ENERGY_Cumulative'], label='ENERGY', linewidth=2)
plt.axhline(y=initial_investment, color='gray', linestyle='--', label='Initial Investment')

plt.title('$1000 Investment Growth', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Portfolio Value ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Cumulative Returns Growth

Insight: TECH nearly doubled your money, ENERGY lost ~10%!

Step 10: Correlation Analysis

Correlation = How stocks move together (-1 to 1)

code.py

# Correlation matrix
returns_df = df[['TECH_Return', 'BANK_Return', 'ENERGY_Return']].dropna()
correlation = returns_df.corr().round(2)

print("=== Correlation Matrix ===")
print(correlation)

	TECH	BANK	ENERGY
TECH	1.00	0.35	0.22
BANK	0.35	1.00	0.28
ENERGY	0.22	0.28	1.00

code.py

# Heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(correlation, annot=True, cmap='RdYlGn', center=0,
            vmin=-1, vmax=1, square=True, linewidths=2)
plt.title('Stock Returns Correlation', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

Correlation Heatmap

Insight: All stocks have low correlation - good for diversification!

Step 11: Risk vs Return Analysis

code.py

# Calculate annualized metrics
trading_days = 252

risk_return = pd.DataFrame({
    'Stock': ['TECH', 'BANK', 'ENERGY'],
    'Annual Return (%)': [
        df['TECH_Return'].mean() * trading_days,
        df['BANK_Return'].mean() * trading_days,
        df['ENERGY_Return'].mean() * trading_days
    ],
    'Annual Volatility (%)': [
        df['TECH_Return'].std() * np.sqrt(trading_days),
        df['BANK_Return'].std() * np.sqrt(trading_days),
        df['ENERGY_Return'].std() * np.sqrt(trading_days)
    ]
}).round(2)

# Sharpe Ratio (assuming 2% risk-free rate)
risk_free = 2
risk_return['Sharpe Ratio'] = (
    (risk_return['Annual Return (%)'] - risk_free) / risk_return['Annual Volatility (%)']
).round(2)

print("=== Risk vs Return Analysis ===")
print(risk_return)

Stock	Annual Return	Annual Volatility	Sharpe Ratio
TECH	30.24%	31.92%	0.88
BANK	12.60%	24.12%	0.44
ENERGY	-5.04%	39.36%	-0.18

code.py

# Scatter plot
plt.figure(figsize=(10, 6))

colors = ['#2ecc71', '#3498db', '#e74c3c']
for i, row in risk_return.iterrows():
    plt.scatter(row['Annual Volatility (%)'], row['Annual Return (%)'],
                s=200, c=colors[i], label=row['Stock'], edgecolor='black')
    plt.annotate(row['Stock'], (row['Annual Volatility (%)']+0.5, row['Annual Return (%)']+1))

plt.axhline(y=0, color='gray', linestyle='--')
plt.title('Risk vs Return', fontsize=14, fontweight='bold')
plt.xlabel('Annual Volatility (Risk) %')
plt.ylabel('Annual Return %')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Risk vs Return

Sharpe Ratio Interpretation:

1.0 = Great
0.5 - 1.0 = Good
< 0.5 = Poor

Step 12: Key Metrics Dashboard

code.py

# Calculate all key metrics
def calculate_metrics(stock_name, df):
    price_col = stock_name
    return_col = f'{stock_name}_Return'

    current_price = df[price_col].iloc[-1]
    start_price = df[price_col].iloc[0]
    high_52w = df[price_col].tail(252).max()
    low_52w = df[price_col].tail(252).min()
    avg_volume = df[f'{stock_name}_Volume'].mean()
    total_return = (current_price / start_price - 1) * 100

    return {
        'Current Price': f'${current_price:.2f}',
        '52-Week High': f'${high_52w:.2f}',
        '52-Week Low': f'${low_52w:.2f}',
        'Total Return': f'{total_return:.1f}%',
        'Avg Daily Volume': f'{avg_volume/1e6:.1f}M'
    }

print("=" * 60)
print("              STOCK ANALYSIS DASHBOARD")
print("=" * 60)

for stock in ['TECH', 'BANK', 'ENERGY']:
    metrics = calculate_metrics(stock, df)
    print(f"\n{stock}:")
    for key, value in metrics.items():
        print(f"  {key}: {value}")

print("=" * 60)

Metric	TECH	BANK	ENERGY
Current Price	$291.21	$52.45	$72.30
52-Week High	$298.75	$58.90	$85.40
52-Week Low	$185.30	$42.10	$55.20
Total Return	94.1%	16.6%	-9.6%
Avg Volume	3.0M	1.3M	1.9M

Step 13: Save Analysis Results

code.py

# Save processed data
df.to_csv('stock_analysis_results.csv', index=False)

# Save summary report
with open('stock_report.txt', 'w') as f:
    f.write("STOCK ANALYSIS REPORT\n")
    f.write("=" * 50 + "\n\n")

    f.write("PERIOD: 2022-01-01 to 2023-12-29\n\n")

    f.write("PERFORMANCE SUMMARY:\n")
    for stock in ['TECH', 'BANK', 'ENERGY']:
        total_ret = (df[stock].iloc[-1] / df[stock].iloc[0] - 1) * 100
        f.write(f"  {stock}: {total_ret:.1f}%\n")

    f.write("\nRECOMMENDATION:\n")
    f.write("  Best Performer: TECH (highest return, good Sharpe)\n")
    f.write("  Most Stable: BANK (lowest volatility)\n")
    f.write("  Avoid: ENERGY (negative return, high volatility)\n")

print("Files saved successfully!")

Complete Code

code.py

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

# Setup
plt.style.use('seaborn-v0_8-whitegrid')
np.random.seed(42)

# Generate stock data
dates = pd.date_range(start='2022-01-01', periods=504, freq='B')

def generate_stock(start, vol, trend, n):
    prices = [start]
    for _ in range(n-1):
        prices.append(prices[-1] * (1 + np.random.normal(trend, vol)))
    return prices

df = pd.DataFrame({
    'Date': dates,
    'TECH': generate_stock(150, 0.02, 0.001, 504),
    'BANK': generate_stock(45, 0.015, 0.0005, 504),
    'ENERGY': generate_stock(80, 0.025, -0.0002, 504)
}).round(2)

# Calculate returns
for stock in ['TECH', 'BANK', 'ENERGY']:
    df[f'{stock}_Return'] = df[stock].pct_change() * 100

# Moving averages
df['TECH_MA50'] = df['TECH'].rolling(50).mean()

# Cumulative returns
for stock in ['TECH', 'BANK', 'ENERGY']:
    df[f'{stock}_Cum'] = 1000 * (1 + df[f'{stock}_Return']/100).cumprod()

# Visualizations
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Price history
axes[0,0].plot(df['Date'], df['TECH'], label='TECH')
axes[0,0].plot(df['Date'], df['BANK'], label='BANK')
axes[0,0].plot(df['Date'], df['ENERGY'], label='ENERGY')
axes[0,0].set_title('Stock Prices')
axes[0,0].legend()

# Returns distribution
axes[0,1].hist(df['TECH_Return'].dropna(), bins=50, alpha=0.7, label='TECH')
axes[0,1].hist(df['BANK_Return'].dropna(), bins=50, alpha=0.7, label='BANK')
axes[0,1].set_title('Returns Distribution')
axes[0,1].legend()

# Cumulative returns
axes[1,0].plot(df['Date'], df['TECH_Cum'], label='TECH')
axes[1,0].plot(df['Date'], df['BANK_Cum'], label='BANK')
axes[1,0].plot(df['Date'], df['ENERGY_Cum'], label='ENERGY')
axes[1,0].axhline(1000, color='gray', linestyle='--')
axes[1,0].set_title('$1000 Investment Growth')
axes[1,0].legend()

# Correlation
corr = df[['TECH_Return', 'BANK_Return', 'ENERGY_Return']].corr()
sns.heatmap(corr, annot=True, ax=axes[1,1], cmap='RdYlGn', center=0)
axes[1,1].set_title('Correlation')

plt.tight_layout()
plt.savefig('stock_dashboard.png', dpi=300)
plt.show()

# Summary
print("\n=== ANALYSIS SUMMARY ===")
for stock in ['TECH', 'BANK', 'ENERGY']:
    ret = (df[stock].iloc[-1] / df[stock].iloc[0] - 1) * 100
    vol = df[f'{stock}_Return'].std() * np.sqrt(252)
    print(f"{stock}: Return={ret:.1f}%, Volatility={vol:.1f}%")

print("\nProject Complete!")

What You Learned

Working with time series financial data
Calculating daily and cumulative returns
Computing moving averages
Measuring volatility and risk
Analyzing correlations between stocks
Risk-adjusted returns (Sharpe Ratio)
Building financial visualizations
Comparing investment performance

Congratulations! You've completed the Stock Price Analysis project!

Investment Recommendations

Stock	Recommendation	Reason
TECH	Buy	High return, good Sharpe ratio
BANK	Hold	Stable, low risk, moderate return
ENERGY	Avoid	Negative return, high volatility

Course Complete!

You've finished all 12 modules of Python for Data Analysis!

What you've learned:

Python fundamentals
NumPy and Pandas
Data cleaning
Exploratory data analysis
Data visualization (Matplotlib, Seaborn, Plotly)
Statistics and machine learning basics
Real-world projects

Next Steps:

Practice with real datasets (Kaggle)
Build your own portfolio projects
Learn more advanced ML techniques
Explore deep learning

Good luck on your data science journey!