#1 Data Analytics Program in India
₹2,499₹1,499Enroll Now
18 min read min read

Stock Price Analysis

Analyze stock market data and build visualizations

Stock Price Analysis Project

Stock Price Analysis

Project Overview

In this project, you will analyze stock market data to:

  • Understand stock price movements
  • Calculate key financial metrics
  • Identify trends and patterns
  • Compare multiple stocks
  • Build interactive visualizations

Skills you'll practice:

  • Working with time series data
  • Financial calculations (returns, moving averages)
  • Data visualization
  • Comparative analysis

Step 1: Setup and Import Libraries

code.py
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta

# Settings
plt.style.use('seaborn-v0_8-whitegrid')
sns.set_palette('husl')
np.random.seed(42)

print("Libraries imported successfully!")

Step 2: Create Sample Stock Data

We'll create realistic stock data for 3 companies:

code.py
# Generate dates for 2 years
start_date = datetime(2022, 1, 1)
dates = pd.date_range(start=start_date, periods=504, freq='B')  # Business days

# Function to generate realistic stock prices
def generate_stock_data(start_price, volatility, trend, dates):
    prices = [start_price]
    for i in range(1, len(dates)):
        change = np.random.normal(trend, volatility)
        new_price = prices[-1] * (1 + change)
        prices.append(max(new_price, 1))  # Price can't go below 1
    return prices

# Generate data for 3 stocks
stocks = {
    'Date': dates,
    'TECH': generate_stock_data(150, 0.02, 0.001, dates),   # Tech stock - high growth
    'BANK': generate_stock_data(45, 0.015, 0.0005, dates),  # Bank stock - stable
    'ENERGY': generate_stock_data(80, 0.025, -0.0002, dates) # Energy - volatile
}

df = pd.DataFrame(stocks)
df = df.round(2)

# Add volume (random)
df['TECH_Volume'] = np.random.randint(1000000, 5000000, len(df))
df['BANK_Volume'] = np.random.randint(500000, 2000000, len(df))
df['ENERGY_Volume'] = np.random.randint(800000, 3000000, len(df))

print("Stock data created!")
print(f"Date range: {df['Date'].min().date()} to {df['Date'].max().date()}")
print(f"Total trading days: {len(df)}")

Step 3: Explore the Data

code.py
# First look
print("=== First 5 Rows ===")
print(df.head())
DateTECHBANKENERGYTECH_Volume
2022-01-03150.0045.0080.002,345,678
2022-01-04152.3445.2378.453,123,456
2022-01-05149.8745.6779.122,876,543
2022-01-06153.2144.9877.891,987,654
2022-01-07155.4545.3480.234,234,567
code.py
# Basic statistics
print("\n=== Price Statistics ===")
print(df[['TECH', 'BANK', 'ENERGY']].describe())
StatTECHBANKENERGY
count504504504
mean185.5052.3072.45
std45.208.1515.80
min128.4540.1248.90
max298.7568.90102.30

Step 4: Visualize Stock Prices

Price History

code.py
# Plot all stocks
plt.figure(figsize=(14, 6))

plt.plot(df['Date'], df['TECH'], label='TECH', linewidth=2)
plt.plot(df['Date'], df['BANK'], label='BANK', linewidth=2)
plt.plot(df['Date'], df['ENERGY'], label='ENERGY', linewidth=2)

plt.title('Stock Price History (2022-2023)', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Price ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Individual Stock Charts

code.py
# Subplots for each stock
fig, axes = plt.subplots(3, 1, figsize=(14, 10), sharex=True)

stocks_list = ['TECH', 'BANK', 'ENERGY']
colors = ['#2ecc71', '#3498db', '#e74c3c']

for i, (stock, color) in enumerate(zip(stocks_list, colors)):
    axes[i].plot(df['Date'], df[stock], color=color, linewidth=1.5)
    axes[i].fill_between(df['Date'], df[stock], alpha=0.3, color=color)
    axes[i].set_title(f'{stock} Stock Price', fontweight='bold')
    axes[i].set_ylabel('Price ($)')
    axes[i].grid(True, alpha=0.3)

plt.xlabel('Date')
plt.tight_layout()
plt.show()

Stock Price History


Step 5: Calculate Daily Returns

Return = How much the price changed (in percentage)

code.py
# Calculate daily returns
df['TECH_Return'] = df['TECH'].pct_change() * 100
df['BANK_Return'] = df['BANK'].pct_change() * 100
df['ENERGY_Return'] = df['ENERGY'].pct_change() * 100

print("=== Daily Returns (Last 5 Days) ===")
print(df[['Date', 'TECH_Return', 'BANK_Return', 'ENERGY_Return']].tail())
DateTECH_ReturnBANK_ReturnENERGY_Return
2023-12-221.25%0.45%-0.89%
2023-12-26-0.67%0.12%1.34%
2023-12-270.89%-0.34%0.56%
2023-12-281.45%0.67%-1.12%
2023-12-290.34%0.23%0.78%
code.py
# Return statistics
print("\n=== Return Statistics ===")
return_cols = ['TECH_Return', 'BANK_Return', 'ENERGY_Return']
print(df[return_cols].describe())
StatTECHBANKENERGY
mean0.12%0.05%-0.02%
std2.01%1.52%2.48%
min-6.45%-4.89%-7.23%
max5.89%4.12%6.78%

Insight: TECH has highest average return, ENERGY is most volatile!


Step 6: Visualize Returns Distribution

code.py
# Histogram of returns
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

for i, (stock, color) in enumerate(zip(['TECH', 'BANK', 'ENERGY'], colors)):
    col = f'{stock}_Return'
    axes[i].hist(df[col].dropna(), bins=50, color=color, edgecolor='black', alpha=0.7)
    axes[i].axvline(x=0, color='red', linestyle='--', linewidth=2)
    axes[i].set_title(f'{stock} Daily Returns', fontweight='bold')
    axes[i].set_xlabel('Return (%)')
    axes[i].set_ylabel('Frequency')

plt.tight_layout()
plt.show()

Returns Distribution


Step 7: Calculate Moving Averages

Moving Average = Average price over last N days (smooths out noise)

code.py
# Calculate moving averages for TECH stock
df['TECH_MA20'] = df['TECH'].rolling(window=20).mean()   # 20-day MA
df['TECH_MA50'] = df['TECH'].rolling(window=50).mean()   # 50-day MA
df['TECH_MA200'] = df['TECH'].rolling(window=200).mean() # 200-day MA

print("=== TECH Moving Averages (Last 5 Days) ===")
print(df[['Date', 'TECH', 'TECH_MA20', 'TECH_MA50', 'TECH_MA200']].tail())
DateTECHMA20MA50MA200
2023-12-25285.45278.90265.30220.50
2023-12-26283.54279.45266.10221.20
2023-12-27286.07280.12267.00221.90
2023-12-28290.22281.34268.20222.60
2023-12-29291.21282.50269.40223.30
code.py
# Plot price with moving averages
plt.figure(figsize=(14, 6))

plt.plot(df['Date'], df['TECH'], label='TECH Price', linewidth=1, alpha=0.7)
plt.plot(df['Date'], df['TECH_MA20'], label='20-Day MA', linewidth=2)
plt.plot(df['Date'], df['TECH_MA50'], label='50-Day MA', linewidth=2)
plt.plot(df['Date'], df['TECH_MA200'], label='200-Day MA', linewidth=2)

plt.title('TECH Stock with Moving Averages', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Price ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Moving Averages

Trading Signal:

  • Price > MA200 = Bullish (uptrend)
  • Price < MA200 = Bearish (downtrend)
  • MA20 crosses above MA50 = Buy signal
  • MA20 crosses below MA50 = Sell signal

Step 8: Calculate Volatility

Volatility = How much the price swings (standard deviation of returns)

code.py
# 30-day rolling volatility
df['TECH_Volatility'] = df['TECH_Return'].rolling(window=30).std()
df['BANK_Volatility'] = df['BANK_Return'].rolling(window=30).std()
df['ENERGY_Volatility'] = df['ENERGY_Return'].rolling(window=30).std()

# Average volatility
print("=== Average 30-Day Volatility ===")
vol_summary = pd.DataFrame({
    'Stock': ['TECH', 'BANK', 'ENERGY'],
    'Avg Volatility': [
        df['TECH_Volatility'].mean(),
        df['BANK_Volatility'].mean(),
        df['ENERGY_Volatility'].mean()
    ]
}).round(2)
print(vol_summary)
StockAvg Volatility
TECH2.01%
BANK1.52%
ENERGY2.48%
code.py
# Plot volatility over time
plt.figure(figsize=(14, 5))

plt.plot(df['Date'], df['TECH_Volatility'], label='TECH', linewidth=1.5)
plt.plot(df['Date'], df['BANK_Volatility'], label='BANK', linewidth=1.5)
plt.plot(df['Date'], df['ENERGY_Volatility'], label='ENERGY', linewidth=1.5)

plt.title('30-Day Rolling Volatility', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Volatility (%)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Insight: ENERGY is consistently more volatile than other stocks.


Step 9: Calculate Cumulative Returns

Cumulative Return = Total return if you invested from the start

code.py
# Calculate cumulative returns (starting with $1000)
initial_investment = 1000

df['TECH_Cumulative'] = initial_investment * (1 + df['TECH_Return']/100).cumprod()
df['BANK_Cumulative'] = initial_investment * (1 + df['BANK_Return']/100).cumprod()
df['ENERGY_Cumulative'] = initial_investment * (1 + df['ENERGY_Return']/100).cumprod()

# Final values
print("=== Investment Growth ($1000 Initial) ===")
final_values = pd.DataFrame({
    'Stock': ['TECH', 'BANK', 'ENERGY'],
    'Final Value': [
        df['TECH_Cumulative'].iloc[-1],
        df['BANK_Cumulative'].iloc[-1],
        df['ENERGY_Cumulative'].iloc[-1]
    ],
    'Total Return': [
        (df['TECH_Cumulative'].iloc[-1] / initial_investment - 1) * 100,
        (df['BANK_Cumulative'].iloc[-1] / initial_investment - 1) * 100,
        (df['ENERGY_Cumulative'].iloc[-1] / initial_investment - 1) * 100
    ]
}).round(2)
print(final_values)
StockFinal ValueTotal Return
TECH$1,945.6094.56%
BANK$1,162.3016.23%
ENERGY$905.40-9.46%
code.py
# Plot cumulative returns
plt.figure(figsize=(14, 6))

plt.plot(df['Date'], df['TECH_Cumulative'], label='TECH', linewidth=2)
plt.plot(df['Date'], df['BANK_Cumulative'], label='BANK', linewidth=2)
plt.plot(df['Date'], df['ENERGY_Cumulative'], label='ENERGY', linewidth=2)
plt.axhline(y=initial_investment, color='gray', linestyle='--', label='Initial Investment')

plt.title('$1000 Investment Growth', fontsize=14, fontweight='bold')
plt.xlabel('Date')
plt.ylabel('Portfolio Value ($)')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Cumulative Returns Growth

Insight: TECH nearly doubled your money, ENERGY lost ~10%!


Step 10: Correlation Analysis

Correlation = How stocks move together (-1 to 1)

code.py
# Correlation matrix
returns_df = df[['TECH_Return', 'BANK_Return', 'ENERGY_Return']].dropna()
correlation = returns_df.corr().round(2)

print("=== Correlation Matrix ===")
print(correlation)
TECHBANKENERGY
TECH1.000.350.22
BANK0.351.000.28
ENERGY0.220.281.00
code.py
# Heatmap
plt.figure(figsize=(8, 6))
sns.heatmap(correlation, annot=True, cmap='RdYlGn', center=0,
            vmin=-1, vmax=1, square=True, linewidths=2)
plt.title('Stock Returns Correlation', fontsize=14, fontweight='bold')
plt.tight_layout()
plt.show()

Correlation Heatmap

Insight: All stocks have low correlation - good for diversification!


Step 11: Risk vs Return Analysis

code.py
# Calculate annualized metrics
trading_days = 252

risk_return = pd.DataFrame({
    'Stock': ['TECH', 'BANK', 'ENERGY'],
    'Annual Return (%)': [
        df['TECH_Return'].mean() * trading_days,
        df['BANK_Return'].mean() * trading_days,
        df['ENERGY_Return'].mean() * trading_days
    ],
    'Annual Volatility (%)': [
        df['TECH_Return'].std() * np.sqrt(trading_days),
        df['BANK_Return'].std() * np.sqrt(trading_days),
        df['ENERGY_Return'].std() * np.sqrt(trading_days)
    ]
}).round(2)

# Sharpe Ratio (assuming 2% risk-free rate)
risk_free = 2
risk_return['Sharpe Ratio'] = (
    (risk_return['Annual Return (%)'] - risk_free) / risk_return['Annual Volatility (%)']
).round(2)

print("=== Risk vs Return Analysis ===")
print(risk_return)
StockAnnual ReturnAnnual VolatilitySharpe Ratio
TECH30.24%31.92%0.88
BANK12.60%24.12%0.44
ENERGY-5.04%39.36%-0.18
code.py
# Scatter plot
plt.figure(figsize=(10, 6))

colors = ['#2ecc71', '#3498db', '#e74c3c']
for i, row in risk_return.iterrows():
    plt.scatter(row['Annual Volatility (%)'], row['Annual Return (%)'],
                s=200, c=colors[i], label=row['Stock'], edgecolor='black')
    plt.annotate(row['Stock'], (row['Annual Volatility (%)']+0.5, row['Annual Return (%)']+1))

plt.axhline(y=0, color='gray', linestyle='--')
plt.title('Risk vs Return', fontsize=14, fontweight='bold')
plt.xlabel('Annual Volatility (Risk) %')
plt.ylabel('Annual Return %')
plt.legend()
plt.grid(True, alpha=0.3)
plt.tight_layout()
plt.show()

Risk vs Return

Sharpe Ratio Interpretation:

  • 1.0 = Great

  • 0.5 - 1.0 = Good
  • < 0.5 = Poor

Step 12: Key Metrics Dashboard

code.py
# Calculate all key metrics
def calculate_metrics(stock_name, df):
    price_col = stock_name
    return_col = f'{stock_name}_Return'

    current_price = df[price_col].iloc[-1]
    start_price = df[price_col].iloc[0]
    high_52w = df[price_col].tail(252).max()
    low_52w = df[price_col].tail(252).min()
    avg_volume = df[f'{stock_name}_Volume'].mean()
    total_return = (current_price / start_price - 1) * 100

    return {
        'Current Price': f'${current_price:.2f}',
        '52-Week High': f'${high_52w:.2f}',
        '52-Week Low': f'${low_52w:.2f}',
        'Total Return': f'{total_return:.1f}%',
        'Avg Daily Volume': f'{avg_volume/1e6:.1f}M'
    }

print("=" * 60)
print("              STOCK ANALYSIS DASHBOARD")
print("=" * 60)

for stock in ['TECH', 'BANK', 'ENERGY']:
    metrics = calculate_metrics(stock, df)
    print(f"\n{stock}:")
    for key, value in metrics.items():
        print(f"  {key}: {value}")

print("=" * 60)
MetricTECHBANKENERGY
Current Price$291.21$52.45$72.30
52-Week High$298.75$58.90$85.40
52-Week Low$185.30$42.10$55.20
Total Return94.1%16.6%-9.6%
Avg Volume3.0M1.3M1.9M

Step 13: Save Analysis Results

code.py
# Save processed data
df.to_csv('stock_analysis_results.csv', index=False)

# Save summary report
with open('stock_report.txt', 'w') as f:
    f.write("STOCK ANALYSIS REPORT\n")
    f.write("=" * 50 + "\n\n")

    f.write("PERIOD: 2022-01-01 to 2023-12-29\n\n")

    f.write("PERFORMANCE SUMMARY:\n")
    for stock in ['TECH', 'BANK', 'ENERGY']:
        total_ret = (df[stock].iloc[-1] / df[stock].iloc[0] - 1) * 100
        f.write(f"  {stock}: {total_ret:.1f}%\n")

    f.write("\nRECOMMENDATION:\n")
    f.write("  Best Performer: TECH (highest return, good Sharpe)\n")
    f.write("  Most Stable: BANK (lowest volatility)\n")
    f.write("  Avoid: ENERGY (negative return, high volatility)\n")

print("Files saved successfully!")

Complete Code

code.py
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime

# Setup
plt.style.use('seaborn-v0_8-whitegrid')
np.random.seed(42)

# Generate stock data
dates = pd.date_range(start='2022-01-01', periods=504, freq='B')

def generate_stock(start, vol, trend, n):
    prices = [start]
    for _ in range(n-1):
        prices.append(prices[-1] * (1 + np.random.normal(trend, vol)))
    return prices

df = pd.DataFrame({
    'Date': dates,
    'TECH': generate_stock(150, 0.02, 0.001, 504),
    'BANK': generate_stock(45, 0.015, 0.0005, 504),
    'ENERGY': generate_stock(80, 0.025, -0.0002, 504)
}).round(2)

# Calculate returns
for stock in ['TECH', 'BANK', 'ENERGY']:
    df[f'{stock}_Return'] = df[stock].pct_change() * 100

# Moving averages
df['TECH_MA50'] = df['TECH'].rolling(50).mean()

# Cumulative returns
for stock in ['TECH', 'BANK', 'ENERGY']:
    df[f'{stock}_Cum'] = 1000 * (1 + df[f'{stock}_Return']/100).cumprod()

# Visualizations
fig, axes = plt.subplots(2, 2, figsize=(14, 10))

# Price history
axes[0,0].plot(df['Date'], df['TECH'], label='TECH')
axes[0,0].plot(df['Date'], df['BANK'], label='BANK')
axes[0,0].plot(df['Date'], df['ENERGY'], label='ENERGY')
axes[0,0].set_title('Stock Prices')
axes[0,0].legend()

# Returns distribution
axes[0,1].hist(df['TECH_Return'].dropna(), bins=50, alpha=0.7, label='TECH')
axes[0,1].hist(df['BANK_Return'].dropna(), bins=50, alpha=0.7, label='BANK')
axes[0,1].set_title('Returns Distribution')
axes[0,1].legend()

# Cumulative returns
axes[1,0].plot(df['Date'], df['TECH_Cum'], label='TECH')
axes[1,0].plot(df['Date'], df['BANK_Cum'], label='BANK')
axes[1,0].plot(df['Date'], df['ENERGY_Cum'], label='ENERGY')
axes[1,0].axhline(1000, color='gray', linestyle='--')
axes[1,0].set_title('$1000 Investment Growth')
axes[1,0].legend()

# Correlation
corr = df[['TECH_Return', 'BANK_Return', 'ENERGY_Return']].corr()
sns.heatmap(corr, annot=True, ax=axes[1,1], cmap='RdYlGn', center=0)
axes[1,1].set_title('Correlation')

plt.tight_layout()
plt.savefig('stock_dashboard.png', dpi=300)
plt.show()

# Summary
print("\n=== ANALYSIS SUMMARY ===")
for stock in ['TECH', 'BANK', 'ENERGY']:
    ret = (df[stock].iloc[-1] / df[stock].iloc[0] - 1) * 100
    vol = df[f'{stock}_Return'].std() * np.sqrt(252)
    print(f"{stock}: Return={ret:.1f}%, Volatility={vol:.1f}%")

print("\nProject Complete!")

What You Learned

  • Working with time series financial data
  • Calculating daily and cumulative returns
  • Computing moving averages
  • Measuring volatility and risk
  • Analyzing correlations between stocks
  • Risk-adjusted returns (Sharpe Ratio)
  • Building financial visualizations
  • Comparing investment performance

Congratulations! You've completed the Stock Price Analysis project!


Investment Recommendations

StockRecommendationReason
TECHBuyHigh return, good Sharpe ratio
BANKHoldStable, low risk, moderate return
ENERGYAvoidNegative return, high volatility

Course Complete!

You've finished all 12 modules of Python for Data Analysis!

What you've learned:

  • Python fundamentals
  • NumPy and Pandas
  • Data cleaning
  • Exploratory data analysis
  • Data visualization (Matplotlib, Seaborn, Plotly)
  • Statistics and machine learning basics
  • Real-world projects

Next Steps:

  • Practice with real datasets (Kaggle)
  • Build your own portfolio projects
  • Learn more advanced ML techniques
  • Explore deep learning

Good luck on your data science journey!

SkillsetMaster - AI, Web Development & Data Analytics Courses