SOPHIE Daddy Quant Blog - Stock & Options Analysis

The Epistemological Crisis

A single historical backtest is just one realization of a stochastic process. It is a sample size of one. Reliance on the specific sequence of historical returns is the primary cause of live trading failure.

The Robustness Goal

We do not seek to predict the future. We seek to characterize the distribution of possible outcomes. A robust strategy is one that survives the 5th percentile of generated alternate histories.

The Mathematical Edge

By generating N synthetic equity curves, we can calculate the Probability of Backtest Overfitting (PBO) and adjust performance metrics (Deflated Sharpe Ratio) to account for selection bias.

Taxonomy of Methods

Selecting the correct simulation kernel based on strategy characteristics.

IID Bootstrap

Resampling with Replacement

Treats returns as independent and identically distributed. Draws from history with replacement to create new sequences.

Best Application

Strategies with NO serial correlation (e.g., pure arb, some mean reversion).

Advantages

Tests sensitivity to outlier trades.
Estimates fat tails better than normal distribution assumptions.

Limitations

Destroys all serial correlation and volatility clustering.
Invalid for Trend Following.

Permutation

Trade Shuffling

Rearranges the order of existing trades without replacement. The total P&L remains identical, but the path changes.

Best Application

Isolating 'Sequence Risk' and analyzing Start Date Sensitivity.

Advantages

Preserves the exact realized return distribution.
Excellent for testing Max Drawdown variability.

Limitations

Cannot test for events that never happened.
Destroys autocorrelation structure.

Block Bootstrap

Stationary / Circular

Samples 'blocks' of L consecutive days/trades to preserve local correlation structure and volatility regimes.

Best Application

Trend Following and strategies reliant on volatility clustering.

Advantages

Preserves market memory within blocks.
Maintains regime dependency.

Limitations

Sensitive to block length selection.
Introduction of noise at block seams.

Surrogate Data

AAFT / Phase Shuffling

Modifies the underlying OHLC data (e.g., randomizing Fourier phases) to test if the signal is distinct from noise.

Best Application

Pattern Recognition & Technical Analysis validation.

Advantages

Acts as a 'Truth Serum' for alpha existence.
Tests if patterns are statistically significant.

Limitations

Computationally expensive.
Complex implementation (Fourier Transforms).

Interpreting the Output

Moving from simulation to actionable decision metrics.

The Cone of Uncertainty

When plotting 10,000 Monte Carlo simulations starting from t=0, the resulting equity curves fan out into a cone shape. This visualizes the stochastic nature of future performance.

Median Path (50%):Expected performance if the future resembles the past average.
The 5th Percentile:The "Bad Luck" boundary. If your live strategy falls below this line, it is likely broken, not just unlucky.
The 99th Percentile Drawdown:Your true capital requirement. Backtests show the historical max drawdown; MC shows the *potential* max drawdown.

Sequence Risk Analysis

Sequence risk is the danger that the timing of withdrawals (or losses) will have a negative impact on the overall portfolio value, essentially maximizing the damage of a drawdown.

The "Start Date" Hazard

A strategy starting in 2010 might show a Sharpe of 2.0. The same strategy starting in 2008 might blow up. MC Shuffling exposes this by simulating thousands of start dates.

Quantitative Pitfalls

Common errors that invalidate Monte Carlo results.

Look-Ahead Bias in Blocks

When using Block Bootstrap, ensuring blocks don't contain future information relative to the trade decision point is critical.

Breaking Serial Correlation

Applying simple IID Bootstrap to a Trend Following strategy destroys the very alpha you are trying to test (the trend). This leads to a massive underestimation of risk.

Distribution Mismatch

Assuming returns are Gaussian when generating synthetic data. Financial returns have fat tails (kurtosis). Using Normal distribution generators will hide "Black Swan" risks.

The "Truth Serum" Test

Before trusting a strategy, run it on Phase Shuffled data (noise with same autocorrelation).

Strategy Performance on Real DataSharpe: 1.8

Strategy Performance on Noise DataSharpe: 0.1

*If Noise Performance is > 1.0, your strategy is overfitting.

The Researcher's Routine

A rigorous validation workflow from hypothesis to live deployment.

Hypothesis & In-Sample Dev

Initial BacktestData Cleaning

Start with economic hypothesis. Code logic. Establish baseline on Training Data. Ensure logic is robust to dirty data.

Critical Checks

Logic Consistency

Look-ahead Check

The 'Sanity' Monte Carlo

Surrogate DataPhase Shuffling

Prove signal exists and is not random drift. Must outperform surrogate data distribution significantly.

Critical Checks

Performance vs Noise

P-value < 0.05

Robustness Stress Test

Parameter JitterSpread Widening

Test stability against execution realities. Jitter parameters to ensure you aren't on a 'local optimum' peak.

Critical Checks

Parameter Plateau

Slippage Sensitivity

Capitalization Estimation

Stationary Block BootstrapMax Entropy

Determine 'UNCLE' point. Use 99th percentile Drawdown from Stationary Block Bootstrap for risk management.

Critical Checks

VaR (99%)

Max Drawdown Duration

Out-of-Sample & PBO

PBOCPCVDSR

Final validation using Combinatorial Purged Cross-Validation (CPCV) to calculate Probability of Backtest Overfitting.

Critical Checks

PBO < 0.2

Deflated Sharpe > 1.0

Live Monitoring

Monte Carlo ConesWalk-Forward

Project Monte Carlo cones forward. Set kill-switches if live performance deviates into the bottom 5th percentile.

Critical Checks

Live vs Simulated

Drift Detection

Strategy-Specific Configurations

One size does not fit all. Tailoring the simulation to the alpha source.

Trend Following

Serial Correlation Critical

Trend strategies rely on "streaks". Standard shuffling breaks these streaks, leading to unrealistically benign drawdown estimates.

Recommended Method:Stationary Block Bootstrap (SBBS)

Block Size:Avg Trend Duration (e.g., 20-60 days)

Key Metric:Drawdown Duration

HFT / Mean Reversion

Execution Critical

Alpha is often small per trade. Risk comes from microstructure noise and fill probability, not necessarily large moves.

Recommended Method:Execution Randomization + Spread Jitter

Noise Injection:Add 10-20% of spread to every fill

Key Metric:Break-even Win Rate

Multi-Asset Portfolio

Correlation Critical

Testing a basket of strategies (e.g., Long/Short Equity + CTA). Risk is that correlations converge to 1.0 during crashes.

Method:Joint Block Bootstrap

Samples blocks of time across ALL assets simultaneously to preserve cross-asset correlations during stress periods.

Stress Test:Correlation Breakdown

Manually force correlations to 0.8+ in simulation to test portfolio survival during a liquidity crisis.

Monte Carlo
Robustness Protocols

The Epistemological Crisis

The Robustness Goal

The Mathematical Edge

Taxonomy of Methods

IID Bootstrap

Advantages

Limitations

Permutation

Advantages

Limitations

Block Bootstrap

Advantages

Limitations

Surrogate Data

Advantages

Limitations

Interpreting the Output

The Cone of Uncertainty

Sequence Risk Analysis

The "Start Date" Hazard

Quantitative Pitfalls

Look-Ahead Bias in Blocks

Breaking Serial Correlation

Distribution Mismatch

The "Truth Serum" Test

The Researcher's Routine

Hypothesis & In-Sample Dev

The 'Sanity' Monte Carlo

Robustness Stress Test

Capitalization Estimation

Out-of-Sample & PBO

Live Monitoring

Strategy-Specific Configurations

Trend Following

HFT / Mean Reversion

Multi-Asset Portfolio

Continue Learning

Educational Disclaimer