Skip to main content

Documentation Index

Fetch the complete documentation index at: https://github-52.mintlify.app/llms.txt

Use this file to discover all available pages before exploring further.

This project explores historical stock market data to understand how prices move over time, where volatility clusters, and how different sectors compare in terms of risk and return. By applying statistical techniques and visual analysis, you can uncover patterns that are invisible in raw price tables — from moving average crossovers that signal momentum shifts to periods of elevated volatility that coincide with macroeconomic events.

Dataset overview

The dataset consists of historical OHLCV (open, high, low, close, volume) records for a selection of publicly traded tickers across multiple sectors, sourced via the yfinance library. Each row represents one trading day and includes the following fields:
ColumnDescription
DateTrading date (index)
OpenOpening price
HighIntraday high price
LowIntraday low price
CloseClosing price
VolumeNumber of shares traded
The dataset spans several years to capture both bull and bear market conditions, providing enough history to compute meaningful rolling statistics and cross-sector comparisons.

Methodology

1

Data acquisition

You pull historical price data using yfinance, specifying a list of ticker symbols and a date range. Data is downloaded into a multi-level Pandas DataFrame, then stacked into a tidy long format with one row per ticker per date. This makes filtering and grouping by ticker straightforward in later steps.
import yfinance as yf
import pandas as pd

tickers = ["AAPL", "MSFT", "XOM", "JPM", "AMZN"]
raw = yf.download(tickers, start="2019-01-01", end="2024-01-01", group_by="ticker")

# Stack into tidy format
frames = []
for ticker in tickers:
    df = raw[ticker].copy()
    df["Ticker"] = ticker
    frames.append(df)

data = pd.concat(frames).reset_index()
2

Cleaning and preprocessing

You drop rows with missing values (typically caused by trading halts or data gaps), ensure the Date column is parsed as a datetime index, and forward-fill any isolated missing closing prices. You also compute a Daily_Return column as the percentage change in closing price from the previous day — this is the foundation for all downstream risk metrics.
data.dropna(subset=["Close"], inplace=True)
data["Date"] = pd.to_datetime(data["Date"])
data.sort_values(["Ticker", "Date"], inplace=True)
data["Daily_Return"] = data.groupby("Ticker")["Close"].pct_change()
3

Trend analysis

You compute short-term (20-day) and long-term (50-day) simple moving averages (SMA) for each ticker. These rolling averages smooth out daily noise and reveal the underlying trend direction. A golden cross — where the 20-day SMA crosses above the 50-day SMA — is a commonly watched bullish signal, while a death cross indicates the opposite.
data["SMA_20"] = data.groupby("Ticker")["Close"].transform(
    lambda x: x.rolling(20).mean()
)
data["SMA_50"] = data.groupby("Ticker")["Close"].transform(
    lambda x: x.rolling(50).mean()
)
4

Volatility and risk metrics

You measure volatility using a 30-day rolling standard deviation of daily returns, then annualize it by multiplying by the square root of 252 (trading days per year). You also compute the Sharpe ratio per ticker to compare risk-adjusted returns, and calculate maximum drawdown to identify the worst peak-to-trough decline in the observation window.
data["Volatility_30d"] = data.groupby("Ticker")["Daily_Return"].transform(
    lambda x: x.rolling(30).std() * (252 ** 0.5)
)
5

Visualization

You produce a suite of charts: candlestick charts with overlaid moving averages for individual tickers, a volume bar chart aligned beneath the price chart, a rolling volatility line chart comparing all tickers, and a correlation heatmap of daily returns across the selected stocks. Each chart is saved as a high-resolution PNG for inclusion in the portfolio.
import matplotlib.pyplot as plt
import seaborn as sns

# Correlation heatmap
pivot = data.pivot_table(index="Date", columns="Ticker", values="Daily_Return")
corr = pivot.corr()

plt.figure(figsize=(8, 6))
sns.heatmap(corr, annot=True, fmt=".2f", cmap="coolwarm", center=0)
plt.title("Return Correlation Matrix")
plt.tight_layout()
plt.savefig("correlation_matrix.png", dpi=150)

Key findings

Moving average crossovers — Across all tickers, golden cross events (20-day SMA crossing above 50-day SMA) preceded sustained price appreciation in roughly 60% of instances, confirming their value as a momentum indicator. However, in sideways markets the signal generated false positives, highlighting the importance of combining it with volume confirmation. Volatility clustering — Daily return volatility is not uniformly distributed over time. High-volatility periods cluster together — especially around earnings announcements and macroeconomic news events — and are followed by mean reversion to lower volatility. This behavior is consistent with the ARCH effects well-documented in financial time series. Sector comparisons — Energy sector tickers (e.g., XOM) exhibited significantly higher annualized volatility than large-cap technology names over the same period, despite lower average daily returns. Correlation between technology tickers was high (≥ 0.75), suggesting limited diversification benefit within a single-sector allocation. Volume and price relationship — Breakouts accompanied by above-average volume showed stronger follow-through than low-volume breakouts, reinforcing the principle that volume confirms price moves.

Visualizations

The following charts are produced by this analysis:
  • Candlestick chart with SMAs: Shows open/high/low/close bars alongside 20-day and 50-day moving averages, making trend direction and crossover events immediately visible.
  • Volume bar chart: Displayed below the candlestick chart using a shared x-axis, allowing you to correlate price moves with trading activity.
  • Rolling volatility comparison: A multi-line chart plotting 30-day annualized volatility for each ticker over time, revealing when and which stocks experienced stress.
  • Correlation matrix heatmap: A symmetric heatmap of pairwise return correlations, color-coded from negative (blue) to positive (red), useful for understanding portfolio diversification.

Technologies

ToolPurpose
Python 3.10+Primary programming language
PandasData manipulation and rolling statistics
MatplotlibCandlestick, volume, and volatility charts
SeabornCorrelation heatmap
yfinanceHistorical market data download
mplfinanceCandlestick chart rendering
Explore other analyses in this portfolio:

IMDB Movie Analysis

Exploratory data analysis of film ratings, genres, and box office performance.

Bank Loan Case Study

Risk analysis and default prediction using lending data.