Avoiding Overfitting

Overfitting is the #1 reason why strategies that look great in backtesting fail in live trading. Understanding it will save you from making expensive mistakes.



What is overfitting?

Overfitting means your strategy’s parameters have been tuned so precisely to historical data that they capture noise rather than real market patterns.

Think of it this way: if you flip a coin 10 times and get 7 heads, you wouldn’t conclude the coin is biased — there’s too little data. The same applies to backtesting: a strategy with 15 trades and Sharpe 3.0 is meaningless. The parameters just happened to fit those 15 specific dates.

The symptom: Excellent backtested performance that evaporates completely in live trading.


In-sample vs. out-of-sample

These two concepts are the foundation of honest strategy evaluation:

Term

Meaning

In-sample (IS)

The date range used to develop and optimise the strategy. Results here are biased upward.

Out-of-sample (OOS)

A date range never seen during development. Results here are honest.

The golden rule: Never use out-of-sample data to make any parameter decisions. The moment you look at OOS results and adjust your strategy, that data becomes in-sample.

Recommended split:

Full data:     2000 ─────────────────────────────── 2024
In-sample:     2000 ──────────────── 2018
Out-of-sample:                       2018 ─────── 2024

Hold back the most recent 20–30% of your data as OOS. Recent data is most relevant to future performance.


Warning signs of overfitting

Watch for these red flags in your backtest results:

1. Too few trades

A strategy needs enough trades for statistics to be meaningful. As a minimum guideline:

# Trades

Reliability of metrics

< 20

Not reliable at all. Results are mostly noise.

20 – 50

Weak. Use with caution.

50 – 100

Acceptable.

> 100

Good statistical basis.

2. Sharpe Ratio too high

A Sharpe above 2.5 on a diversified equity strategy is highly suspicious. Real-world systematic strategies run by professionals typically achieve 0.5–1.5. If your backtest shows Sharpe 4+, check your logic carefully for look-ahead bias.

3. Many parameters, few trades

If your strategy has 8 optimisable parameters and only 40 trades, the optimizer had more “degrees of freedom” to fit the data than there were data points. Rule of thumb: you need at least 10 trades per free parameter.

Parameters

Minimum trades needed

2

20

4

40

8

80

4. Equity curve only rises in one specific period

If the backtest covers 20 years but 90% of the profit came in a single 2-year window, the strategy may have just accidentally captured one bull market run — not a repeatable edge.

5. Isolated best parameter

In an optimisation run, if the best parameter set is surrounded by much worse results, it’s an overfit spike:

Period 12:  Sharpe 0.8
Period 13:  Sharpe 0.9
Period 14:  Sharpe 2.6   ← overfit spike
Period 15:  Sharpe 1.0
Period 16:  Sharpe 0.9

A robust parameter sits inside a plateau of good results, not a spike.


Look-ahead bias

Look-ahead bias is a specific type of overfitting where your strategy inadvertently uses future data. It produces unrealistically perfect results.

Common causes in block strategies:

  • Using Offset = 0 when you should use Offset = 1. If you check a condition and enter a trade on the same bar, you’re using closing price information that wasn’t available at the moment the trade would have been placed.

  • Calculating indicators on the full dataset before slicing by date.

Tip

Use Execution = sod (start of day) for entries to simulate executing at the next day’s open, which is more realistic than executing at the bar’s close.


Degrees of freedom

Each parameter you add to your strategy “uses up” a degree of freedom — another dimension along which the optimizer can accidentally fit the data.

A simple strategy with 2 parameters and 200 trades is far more robust than a complex strategy with 10 parameters and the same 200 trades.

Principles:

  1. Start simple — get the core idea working with 2–3 parameters.

  2. Only add complexity if there is a logical reason for it, not just because it improves backtest results.

  3. Every indicator you add should have an economic explanation for why it should predict returns.


The out-of-sample test

The only honest final test is a single run on data you have never touched:

  1. Develop and optimise on in-sample data (e.g. 2000–2018).

  2. Lock your parameters — do not change them again.

  3. Run exactly one backtest on out-of-sample data (2018–2024).

  4. Report both results side-by-side.

A healthy strategy shows:

Metric

In-sample

Out-of-sample

Sharpe Ratio

1.40

1.15 ✓ (within ~30%)

Max Drawdown

-22%

-28% ✓ (similar scale)

Total Return

185%

67% ✓ (shorter period)

An overfit strategy shows:

Metric

In-sample

Out-of-sample

Sharpe Ratio

2.80

0.15 ✗ (collapsed)

Max Drawdown

-8%

-55% ✗ (much worse)


Practical checklist

Before trusting any strategy result, run through this checklist:

Check

The strategy has at least 50+ completed trades in the test period.

The strategy was not optimised on the final 20% of the date range.

The Sharpe Ratio is below 2.5 (or you have a clear explanation if it’s higher).

The number of parameters is ≤ trades / 10.

The best parameter is not an isolated spike in the optimisation table.

The strategy’s logic has a clear economic rationale (not just “these numbers worked”).

Performance was tested in both bull and bear market periods.

Commissions and slippage are included in the results.