Module 2 — Python for Quants
Loading and Cleaning Market Data
Pandas, NumPy, and the data-wrangling muscle memory you'll use every day.
Module lessons
Python Building Blocks for QuantsNumPy and Pandas — Vectorised FinanceLoading and Cleaning Market DataLearning objectives
- ▸Parse a CSV with proper date handling.
- ▸Detect and handle missing bars and bad ticks.
- ▸Split data into train, validation, and test windows.
CODE
The boring-but-critical loader
df = pd.read_csv(
'AAPL.csv',
parse_dates=['date'],
index_col='date',
).sort_index()
# Forward-fill stale ticks but never bridge corporate-action gaps.
df = df.asfreq('B').ffill(limit=2)
# Detect outliers (bad ticks).
ret = df['close'].pct_change()
bad = ret.abs() > 0.4
print(df.index[bad])TEXT
Train / validation / test
Never tune parameters on the same window you report performance on. A standard split for daily data: • 2010–2018 → train (fit models) • 2019–2020 → validation (pick hyperparameters) • 2021–today → out-of-sample test (single evaluation) Using test data for parameter selection is the single most common source of fake backtests.