Machine Learning

Improving UK-CAT accuracy with machine learning

Background

UK-CAT — UK Charity Activity Tags — is an open classification system developed by NCVO Research, Dr Christopher Damm (Sheffield Hallam University), and David Kane, with funding from the Esmée Fairbairn Foundation. It assigns activity tags to every registered charity in the United Kingdom, covering around 200,000 organisations across all three UK charity regulators.

The challenge

The stakes for accurate classification are high. Funders use UK-CAT to find charities working in specific areas. Researchers use it as a sampling frame for sector analysis. Infrastructure bodies use it to map local and regional provision. Misclassified charities can be overlooked for funding, excluded from research, or misrepresent the scale of an entire service area.

The original classification engine used regular expression matching — keyword rules applied against charity names, activity statements, and objects. Regex is fast, transparent, and easy to audit, but it is brittle. Charities describe themselves in wildly different ways: one might say "supporting people with dementia", another "specialist memory care", another simply name a condition-specific service.

Worse, keyword matching has no understanding of context. A charity describing its mission as "educating the public on mental health" would be tagged as both an Education charity and a Mental Health charity — when only the latter applies. The word "educating" fires the Education rule regardless of whether education is actually what the charity does. At 200,000 charities across a taxonomy of 230-odd tags, those false positives accumulate into a classification system that is unreliable for exactly the high-stakes decisions it is meant to support.

Our approach

We built a hybrid classification system that trained multi-label machine learning models on top of the existing keyword infrastructure. Rather than replacing the rules-based approach, the ML layer takes priority when it has sufficient confidence — and defers back to pattern matching when it doesn't. This preserved everything that worked well about the existing system while addressing its weaknesses.

The models were trained to handle the multi-label nature of the problem: a single charity can correctly receive many tags simultaneously, so standard single-class classification methods don't apply. Feature engineering drew on the same text fields used by the keyword rules — charity names, activity descriptions, and objects — ensuring the ML layer and the rules layer operated on common ground.

The outcome

The hybrid approach delivered a more than threefold improvement in classification accuracy over the keyword-only baseline — a substantial gain for a system operating at this scale. Precision improved alongside recall, meaning fewer false positives as well as fewer missed classifications. In practice, this means charities are less likely to be misfiled under categories they don't belong to, and less likely to be missed from categories they do. For a funder searching for charities working in a specific area, or a researcher trying to size a slice of the sector, that reliability is what makes the data usable.

Why the hybrid design matters

Replacing keyword matching with an ML model outright would have solved some problems while creating others. Machine learning generalises well across varied phrasing, but it can be overconfident in areas where training data is sparse, and it loses the hard-won precision of carefully crafted rules for well-defined categories. A pure ML system would trade one set of errors for another.

For the people who rely on UK-CAT, this matters because the data underpins real decisions. A funder allocating grants to underserved areas needs to know that the charities surfaced by a search actually work in that area — not that they merely mentioned a relevant word somewhere. A researcher mapping provision across a region needs counts they can trust. A policy team making the case for investment needs evidence that won't unravel under scrutiny. Classification errors don't stay in the data; they propagate into the decisions built on top of it. A more accurate system means decisions made on better ground.