[Home](/) › Track Record

# Track record

Walk-forward backtest, 1999–2025. Each region-year is predicted by a model
trained only on data available before that year, no in-sample fit, no
look-ahead.

How to read this page

Tradeable = the model called medium- or high-confidence (i.e. a call a desk
would actually take). p<0.0001 on predicted-vs-actual yield anomaly
(r=0.316) across the 231 scored region-years. Year-by-year and
case-study breakdowns below.

## Backtest summary

62.3%

Tradeable hit rate (38/61)

92%

"Bad year" calls (24/26)

72%

High-confidence (13/18)

46%

All 231 region-years

The discipline is the product. Forced to call every region-year, the model
would be right 46% of the time; instead it stays silent unless
its components agree, and on the 61 calls it actually makes that
rises to 62.3%. The bad-year calls, the ones that matter most
for risk, run at 92%.

## Predicted vs actual

Every region-year in the walk-forward backtest plotted as one point.
Top-right and bottom-left quadrants are direction-correct; top-left and
bottom-right are direction-wrong. Dashed diagonal is perfect prediction.

## Year-by-year

Years with 0 tradeable calls are years the model said “no
high-conviction view” for every region, a feature, not a missed run.

| Crop year | Regions | Tradeable calls | Correct | Hit rate | Avg predicted anomaly | Avg actual anomaly |
|---|---|---|---|---|---|---|
| 2003/04 | 11 | 7 | 3 | 43% | +0.36 | +0.23 |
| 2004/05 | 11 | 0 | 0 | no tradeable | +0.14 | +0.15 |
| 2005/06 | 11 | 2 | 1 | 50% | -0.01 | +0.22 |
| 2006/07 | 11 | 0 | 0 | no tradeable | +0.06 | -0.74 |
| 2007/08 | 11 | 2 | 1 | 50% | +0.42 | +0.46 |
| 2008/09 | 11 | 0 | 0 | no tradeable | +0.06 | -0.05 |
| 2009/10 | 11 | 2 | 0 | 0% | +0.15 | +0.02 |
| 2010/11 | 11 | 3 | 1 | 33% | +0.25 | +0.28 |
| 2011/12 | 11 | 1 | 1 | 100% | +0.06 | -1.96 |
| 2012/13 | 11 | 4 | 4 | 100% | -0.46 | -0.84 |
| 2013/14 | 11 | 2 | 2 | 100% | +0.23 | +0.89 |
| 2014/15 | 11 | 3 | 3 | 100% | +0.27 | +1.70 |
| 2015/16 | 11 | 2 | 1 | 50% | +0.21 | +0.02 |
| 2016/17 | 11 | 3 | 1 | 33% | -0.21 | +0.57 |
| 2017/18 | 11 | 1 | 1 | 100% | -0.04 | -0.32 |
| 2018/19 | 11 | 6 | 2 | 33% | -0.23 | +1.52 |
| 2019/20 | 11 | 9 | 8 | 89% | -0.73 | -1.12 |
| 2020/21 | 11 | 2 | 0 | 0% | -0.08 | +0.36 |
| 2021/22 | 11 | 0 | 0 | no tradeable | +0.14 | n/a |
| 2022/23 | 11 | 3 | 1 | 33% | -0.18 | +0.38 |
| 2023/24 | 11 | 6 | 6 | 100% | -0.33 | -0.77 |
| 2024/25 | 11 | 3 | 2 | 67% | -0.17 | -0.44 |
| 2025/26 | 11 | 0 | 0 | no tradeable | -0.00 | n/a |

## Case studies

**[2019, disaster year, 8 of 9 correct](/case-studies/2019).**
The model called the wet autumn drilling and hot June flowering stress
correctly across the wheat belt. UK average yield fell to 7.0 t/ha, the
worst since 2012.

**[2023, wet harvest, 6 of 6 correct](/case-studies/2023).**
Compound flowering and ripening stress flagged the below-average yield,
with the strongest signal in Eastern.

**[2018, where we got it wrong, 2 of 6 correct](/case-studies/2018).**
The model called bearish on heat stress; UK wheat actually benefited from
unusually low disease pressure that year. The miss is the strongest
argument for the sentiment layer that catches farmer reports of
disease pressure in real time.

## Methodology disclaimers

- Walk-forward only. Each year's call uses a model trained on prior years only.
- 2022 yield data missing from the source DEFRA dataset, that year is excluded from the hit-rate calculation rather than counted as a miss.
- Hit rate is direction-only (above / below average). Magnitude correlation separately reported (r=0.316, p<0.0001).
- Sentiment overlay currently affects displayed confidence, not the direction call itself. Future architecture iterations will integrate sentiment as a feature column once forward sentiment data accumulates.

## Live forward calls

This is the forward-prediction log, the call CropIntel is making *now*,
before the harvest confirms it. Unlike the backtest above, nobody knows the
outcome yet. Harvest outcomes populate against each crop year as DEFRA confirms
them (typically late August). This is the artefact to watch: a public,
timestamped record of calls made ahead of the event. The public forward log
began on 29 April 2026; the 2026 harvest is
its first live, independently-verifiable validation milestone (the backtest
above covers 2003-2025). It accrues one harvest at a time, which is precisely
the part a new entrant cannot compress.

| As of | Crop year | National call | Confidence | Compound stress |
|---|---|---|---|---|
| 2026-05-26 | 2025/2026 | average | low | -0.04 |
| 2026-05-24 | 2025/2026 | Above average | medium | +0.05 |
| 2026-05-17 | 2025/2026 | Above average | high | +0.28 |
| 2026-05-10 | 2025/2026 | Above average | high | +0.33 |
| 2026-05-03 | 2025/2026 | Above average | medium | +0.42 |

One row per week, newest first. Today's full per-region breakdown:
[today's call](/today).

Related reading: [Methodology](/methodology) ·
[Glossary](/glossary) ·
[Case studies index](/case-studies/)
