Avoiding Overfitting ===================== Overfitting is the #1 reason why strategies that look great in backtesting fail in live trading. Understanding it will save you from making expensive mistakes. ---- .. contents:: On this page :local: :depth: 1 ---- What is overfitting? --------------------- Overfitting means your strategy's parameters have been tuned so precisely to historical data that they capture *noise* rather than real market patterns. Think of it this way: if you flip a coin 10 times and get 7 heads, you wouldn't conclude the coin is biased — there's too little data. The same applies to backtesting: a strategy with 15 trades and Sharpe 3.0 is meaningless. The parameters just happened to fit those 15 specific dates. **The symptom:** Excellent backtested performance that evaporates completely in live trading. ---- In-sample vs. out-of-sample ----------------------------- These two concepts are the foundation of honest strategy evaluation: .. list-table:: :header-rows: 1 :widths: 30 70 * - Term - Meaning * - **In-sample (IS)** - The date range used to *develop and optimise* the strategy. Results here are biased upward. * - **Out-of-sample (OOS)** - A date range *never seen* during development. Results here are honest. **The golden rule:** Never use out-of-sample data to make any parameter decisions. The moment you look at OOS results and adjust your strategy, that data becomes in-sample. **Recommended split:** .. code-block:: text Full data: 2000 ─────────────────────────────── 2024 In-sample: 2000 ──────────────── 2018 Out-of-sample: 2018 ─────── 2024 Hold back the most recent 20–30% of your data as OOS. Recent data is most relevant to future performance. ---- Warning signs of overfitting ------------------------------ Watch for these red flags in your backtest results: **1. Too few trades** A strategy needs enough trades for statistics to be meaningful. As a minimum guideline: .. list-table:: :header-rows: 1 :widths: 25 75 * - # Trades - Reliability of metrics * - < 20 - Not reliable at all. Results are mostly noise. * - 20 – 50 - Weak. Use with caution. * - 50 – 100 - Acceptable. * - > 100 - Good statistical basis. **2. Sharpe Ratio too high** A Sharpe above 2.5 on a diversified equity strategy is highly suspicious. Real-world systematic strategies run by professionals typically achieve 0.5–1.5. If your backtest shows Sharpe 4+, check your logic carefully for look-ahead bias. **3. Many parameters, few trades** If your strategy has 8 optimisable parameters and only 40 trades, the optimizer had more "degrees of freedom" to fit the data than there were data points. Rule of thumb: you need at least **10 trades per free parameter**. .. list-table:: :header-rows: 1 :widths: 30 70 * - Parameters - Minimum trades needed * - 2 - 20 * - 4 - 40 * - 8 - 80 **4. Equity curve only rises in one specific period** If the backtest covers 20 years but 90% of the profit came in a single 2-year window, the strategy may have just accidentally captured one bull market run — not a repeatable edge. **5. Isolated best parameter** In an optimisation run, if the best parameter set is surrounded by much worse results, it's an overfit spike: .. code-block:: text Period 12: Sharpe 0.8 Period 13: Sharpe 0.9 Period 14: Sharpe 2.6 ← overfit spike Period 15: Sharpe 1.0 Period 16: Sharpe 0.9 A robust parameter sits inside a *plateau* of good results, not a spike. ---- Look-ahead bias ---------------- Look-ahead bias is a specific type of overfitting where your strategy inadvertently uses future data. It produces unrealistically perfect results. **Common causes in block strategies:** - Using ``Offset = 0`` when you should use ``Offset = 1``. If you check a condition and enter a trade on the *same* bar, you're using closing price information that wasn't available at the moment the trade would have been placed. - Calculating indicators on the full dataset before slicing by date. .. tip:: Use ``Execution = sod`` (start of day) for entries to simulate executing at the *next* day's open, which is more realistic than executing at the bar's close. ---- Degrees of freedom ------------------- Each parameter you add to your strategy "uses up" a degree of freedom — another dimension along which the optimizer can accidentally fit the data. A simple strategy with 2 parameters and 200 trades is far more robust than a complex strategy with 10 parameters and the same 200 trades. **Principles:** 1. Start simple — get the core idea working with 2–3 parameters. 2. Only add complexity if there is a logical reason for it, not just because it improves backtest results. 3. Every indicator you add should have an *economic explanation* for why it should predict returns. ---- The out-of-sample test ------------------------ The only honest final test is a single run on data you have never touched: 1. Develop and optimise on in-sample data (e.g. 2000–2018). 2. **Lock** your parameters — do not change them again. 3. Run exactly **one** backtest on out-of-sample data (2018–2024). 4. Report both results side-by-side. A healthy strategy shows: .. list-table:: :header-rows: 1 :widths: 35 30 30 * - Metric - In-sample - Out-of-sample * - Sharpe Ratio - 1.40 - 1.15 ✓ (within ~30%) * - Max Drawdown - -22% - -28% ✓ (similar scale) * - Total Return - 185% - 67% ✓ (shorter period) An overfit strategy shows: .. list-table:: :header-rows: 1 :widths: 35 30 30 * - Metric - In-sample - Out-of-sample * - Sharpe Ratio - 2.80 - 0.15 ✗ (collapsed) * - Max Drawdown - -8% - -55% ✗ (much worse) ---- Practical checklist -------------------- Before trusting any strategy result, run through this checklist: .. list-table:: :header-rows: 1 :widths: 10 90 * - ✓ - Check * - - The strategy has at least **50+ completed trades** in the test period. * - - The strategy was **not** optimised on the final 20% of the date range. * - - The Sharpe Ratio is **below 2.5** (or you have a clear explanation if it's higher). * - - The number of parameters is **≤ trades / 10**. * - - The best parameter is **not an isolated spike** in the optimisation table. * - - The strategy's logic has a **clear economic rationale** (not just "these numbers worked"). * - - Performance was tested in both **bull and bear** market periods. * - - Commissions and slippage are included in the results.