Track record

Walk-forward backtest, 2003-2025. Each region-year is predicted by a model trained only on data available before that year, no in-sample fit, no look-ahead.

How to read this page

Tradeable = the model called medium- or high-confidence (i.e. a call a desk would actually take). p<0.0001 on predicted-vs-actual yield anomaly (r=0.307) across the 231 scored region-years, pooled. Those region-years span 21 independent seasons, and regions within one season share national weather, so the year-by-year table below is the sterner view of the same record. Year-by-year and case-study breakdowns below.

Backtest summary

62.7%

Tradeable hit rate (37/59)

92%

Bad years caught (23/25)

71%

High-confidence (12/17)

46%

All 231 region-years

The discipline is the product. Forced to call every region-year, the model would be right 46% of the time; instead it stays silent unless its components agree, and on the 59 calls it actually makes that rises to 62.7%. On the below-average years that matter most for risk, conviction calls caught 23 of 25 (92%).

Predicted vs actual

Every region-year in the walk-forward backtest plotted as one point. Top-right and bottom-left quadrants are direction-correct; top-left and bottom-right are direction-wrong. Dashed diagonal is perfect prediction.

Year-by-year

Years with 0 tradeable calls are years the model said “no high-conviction view” for every region, a feature, not a missed run.

Crop year	Regions	Tradeable calls	Correct	Hit rate	Avg predicted anomaly	Avg actual anomaly
2003/04	11	7	3	43%	+0.36	+0.23
2004/05	11	0	0	no tradeable	+0.13	+0.15
2005/06	11	2	1	50%	-0.03	+0.22
2006/07	11	0	0	no tradeable	+0.07	-0.74
2007/08	11	2	1	50%	+0.42	+0.46
2008/09	11	0	0	no tradeable	+0.05	-0.05
2009/10	11	1	0	0%	+0.15	+0.02
2010/11	11	3	1	33%	+0.25	+0.28
2011/12	11	1	1	100%	+0.06	-1.96
2012/13	11	4	4	100%	-0.46	-0.84
2013/14	11	2	2	100%	+0.22	+0.89
2014/15	11	3	3	100%	+0.26	+1.70
2015/16	11	2	1	50%	+0.21	+0.02
2016/17	11	3	1	33%	-0.21	+0.57
2017/18	11	1	1	100%	-0.05	-0.32
2018/19	11	6	2	33%	-0.23	+1.52
2019/20	11	9	8	89%	-0.75	-1.12
2020/21	11	2	0	0%	-0.08	+0.36
2021/22	11	0	0	no tradeable	+0.12	n/a
2022/23	11	3	1	33%	-0.21	+0.38
2023/24	11	5	5	100%	-0.32	-0.77
2024/25	11	3	2	67%	-0.17	-0.44
2025/26	11	0	0	no tradeable	+0.01	n/a

Case studies

2019, disaster year, 8 of 9 correct. The model called the wet autumn drilling and hot June flowering stress correctly across the wheat belt. UK average yield fell to 7.0 t/ha, the worst since 2012.

2023, wet harvest, 5 of 5 correct. Compound flowering and ripening stress flagged the below-average yield, with the strongest signal in Eastern.

2018, where we got it wrong, 2 of 6 correct. The model called bearish on heat stress; UK wheat actually benefited from unusually low disease pressure that year. The miss is the strongest argument for the sentiment layer that catches farmer reports of disease pressure in real time.

Methodology disclaimers

Walk-forward only. Each year's call uses a model trained on prior years only.
2022 yield data missing from the source DEFRA dataset, that year is excluded from the hit-rate calculation rather than counted as a miss.
Hit rate is direction-only (above / below average). Magnitude correlation separately reported (r=0.307, p<0.0001).
Sentiment overlay currently affects displayed confidence, not the direction call itself. Future architecture iterations will integrate sentiment as a feature column once forward sentiment data accumulates.

Live forward calls

This is the forward-prediction log, the call CropIntel is making now, before the harvest confirms it. Unlike the backtest above, nobody knows the outcome yet. Harvest outcomes populate against each crop year as DEFRA confirms them (typically late August). This is the artefact to watch: a public, timestamped record of calls made ahead of the event. The public forward log began on 29 April 2026; the 2026 harvest is its first live, independently-verifiable validation milestone (the backtest above covers 2003-2025). It accrues one harvest at a time, which is precisely the part a new entrant cannot compress.

As of	Crop year	National call	Confidence	Compound stress
2026-07-10	2025/2026	Below average	medium	+1.03
2026-07-05	2025/2026	average	low	+0.49
2026-06-28	2025/2026	Above average	high	+0.38
2026-06-21	2025/2026	average	low	-0.78
2026-06-14	2025/2026	average	low	-1.09
2026-06-07	2025/2026	Above average	medium	-0.17
2026-05-31	2025/2026	Above average	medium	-0.16
2026-05-24	2025/2026	Above average	medium	+0.05
2026-05-17	2025/2026	Above average	high	+0.28
2026-05-10	2025/2026	Above average	high	+0.33
2026-05-03	2025/2026	Above average	medium	+0.42

One row per week, newest first. Today's full per-region breakdown: today's call.

How the 2026 season will be scored

Pre-registered 8 July 2026, before any outcome is known.

The forward calls above will be scored when DEFRA publishes regional wheat yields for harvest 2026. The rules, fixed now:

The calls scored are exactly those published in the log above: the final standing call for each region, and the national call, as of 31 July 2026 (the end of the pre-harvest window). No later revision counts.
A directional call is correct if the region's actual yield anomaly falls on the called side, using the same near-average band convention as the backtest; outcomes inside the band score as near average, not as hits.
Medium- and high-confidence calls count toward the tradeable record. Low-confidence and near-average calls are reported but not counted as conviction calls.
Every result will be published on this page, hit or miss, alongside the full weekly evolution shown in the log.

Methodology note, 8 July 2026: the normalisation baseline behind the stress scores was pinned to completed seasons only, so the backtest above is reproducible on any day of the year rather than drifting as in-season weather arrives. All backtest figures on this site were re-derived on the same date. The frozen model itself is unchanged.

Related reading: Methodology · Glossary · Case studies index