[Home](/) › Methodology

# Methodology

How CropIntel turns weather, satellite, and farmer sentiment data into a daily UK wheat yield forecast.

In one paragraph

CropIntel uses a multi-component [ensemble model](/glossary#ensemble-model) over a deep historical baseline of UK regional weather, soil, and satellite data, augmented by a hybrid lexicon + LLM scoring of UK farmer commentary. The model emits a directional call only when its components agree, producing a **62.3% tradeable hit rate** on the walk-forward backtest. Sentiment from real-time farmer commentary serves as a bounded confidence overlay, it does not change the model's directional call, only the displayed confidence label.

## Data layers

### Weather (Open-Meteo / ERA5)

Daily temperature, rainfall, sunshine, frost-day counts at a representative point per [DEFRA region](/glossary#defra-region), 1999–present. Source: ECMWF ERA5 reanalysis via Open-Meteo's archive API. Multi-point sampling (3–5 points per region) is incremental work in progress.

The model's headline weather feature is the [compound stress score](/glossary#compound-stress), a multi-stage aggregation across the wheat growth cycle, capturing how each stage's weather deviated from the 1999-present baseline. Unlike single-event weather models that flag heatwaves or droughts in isolation, compound stress aggregates pressure across the full crop year, reflecting how a yield outcome usually has multiple contributing weather episodes.

### Soil moisture (COSMOS-UK)

Daily volumetric water content, 2013–present. Used as a contextual variable in z-score calculation only. The pre-2013 gap means it can't be a direct model feature without introducing a structural bias against earlier years.

### Satellite NDVI (Sentinel-2)

Mean and median NDVI per region, 2015–present. Tested both unmasked and SCL-masked (cropland pixels only). Neither correlates meaningfully with yield at flowering or stem-extension stages, the SCL mask excludes forest and water but cannot distinguish wheat from grass or other crops. Crop-type classification (e.g. UKCEH Land Cover Plus: Crops) would be needed to isolate wheat specifically; pending licensing.

### Farmer sentiment (TFF)

Daily ingest of posts from active arable sub-forums on [The Farming Forum](https://thefarmingforum.co.uk/). Compliant ingest with publisher-respecting rate limits and an identifying user-agent. The corpus is actively curated, practitioners are mapped to DEFRA regions and the off-topic / market-wire content is filtered out at scoring time.

Each post is scored twice. The [UK-agri lexicon](/glossary#agri-lexicon) (a hand-curated dictionary spanning the major arable signal categories, disease, weather, drilling progress, market mood, etc.) produces a deterministic score in [-1, +1] with a [small-sample-stable normalisation](/glossary#score-normalisation). A frontier LLM then scores the post independently with a one-line rationale, catching sarcasm and context the lexicon misses (e.g. "the best septoria fungicide is dry weather" reads as positive sentiment about dry conditions, not a disease complaint). The two scores are blended with weight toward the LLM. Posts with no agronomic signal at all are excluded from daily aggregates rather than averaged in as zero, this is the [out-of-season filter](/glossary#out-of-season-filter), important for keeping the daily aggregate uncontaminated by off-topic chat.

## The ensemble model

A multi-component model with consensus filtering. Several complementary
modelling components, each with a different inductive bias, vote on the
directional yield call. The ensemble emits a tradeable call only when the
components agree, hence the tradeable subset is smaller than the full
population but cleaner.

Cross-validated walk-forward: 62.3% hit rate on tradeable calls (38 of 61),
r=0.316 (p<0.0001) on predicted-vs-actual yield anomaly, and 92% on the
bad-year calls the model made with conviction (24 of 26).

**When it's sharpest.** The call firms up as the crop develops
and is at its most reliable from stem extension onward (late spring), still
weeks ahead of DEFRA harvest figures and AHDB condition reports, the window in
which the trade has not yet repriced. That is the edge: not a nine-months-out
guess, but a confident regional read while the decision is still open.

## The sentiment confidence overlay

The sentiment confidence overlay sits over the directional model, a
bounded multiplier, never wide enough to flip the directional call. It
sits at 1.0 when sentiment is silent, edges up when sentiment direction
matches the ensemble's call, and edges down when it contradicts.
**The overlay does not change the directional call**, it
modifies the displayed confidence label only. Future architecture
iterations will land as forward sentiment data accumulates, allowing
the overlay to graduate into a feature.

## Walk-forward validation

Every claim about historical accuracy uses walk-forward validation. Each
region-year is predicted by a model trained only on data available before that
year, no in-sample fit, no look-ahead. The 62.3% hit rate is therefore a fair
estimate of how the model would have performed if deployed in real time at any
point in the 1999–2025 window. See the full [Track Record](/track-record)
for year-by-year breakdown.

## What we do not publish

For competitive reasons, the following are not on the public site:

- The lexicon's term list and weights.
- Hyperparameters and feature-importance weights inside the ensemble.
- The curated practitioner-region map.
- The historical sentiment corpus, an accumulating, scored record that compounds daily and can't be back-filled.

The methodology is explainable by design: an acquirer's analyst can audit
the decisions in plain English. The defensible IP is the curation and the
public track record, which accrue only in calendar time, so a replicator
starting today is years behind by definition.

Related reading: [Track Record](/track-record) ·
[Glossary](/glossary) ·
[2019 case study](/case-studies/2019) ·
[2018 miss](/case-studies/2018)
