2025 Annual Report On Cognitive Automation

On This Page

The Manufacturing Parallel That Knowledge Work Has Ignored
Why Decision Accuracy Is Systematically Overestimated
The Irreversible Nature of Process Automation
The Vendor Selection Problem
The Coming Verification Crisis
An Unconventional Approach
What We Have Learned About Expertise
Looking Forward

Christopher Helm founded Helm & Nagel GmbH in 2016 after studying Information Technology at TU Munich and Business Administration at the University of Mannheim. He leads the company's technical strategy and personally reviews production deployments.

There are two aspects of the cognitive automation problem upon which management can have significant impact: (1) measuring whether your organization's decisions are as accurate as you believe them to be, and (2) understanding whether the variance in how your people work represents valuable expertise or expensive entropy.

The first thing to recognize is that most organizations implement automation to make their current process faster. What they discover later is that they have made a flawed process faster at great expense. The question worth asking before you automate anything is: What percentage of our decisions are correct? Not what percentage of our documents are processed, but what percentage of our decisions are right.

This is harder to measure than throughput. It is also more important. Yet most organizations skip this measurement entirely, automating their existing process rather than redesigning it. This means they accelerate decision errors rather than eliminate them. Understanding what accuracy gaps exist before automation begins is essential to any implementation that compounds over time.

The Manufacturing Parallel That Knowledge Work Has Ignored

Manufacturing learned this lesson decades ago. Toyota did not become Toyota by making their existing production process faster. They became Toyota by measuring what was actually happening on the factory floor, discovering that much of what looked like necessary variation was waste, and systematically removing it.

Six Sigma, Lean Production, Total Quality Management: these are all methodologies for measuring actual performance versus assumed performance, then standardizing what works and eliminating what does not.

Knowledge work has largely ignored this lesson. We assume that because work involves judgment rather than physical assembly, it cannot be standardized. We assume that variance in how people work reflects expertise rather than inconsistency.

In our experience, this assumption is usually wrong.

When you measure how knowledge workers actually make decisions (in contract review, compliance assessment, customer onboarding, payment approvals) you typically find that what looks like individualized expertise is actually a small number of core decision patterns with extensive local variations that add no value.

Ask 100 employees how they do their work and you will receive 101 opinions. Measure how they actually do their work and you will find perhaps 12 patterns that account for 90% of decisions, obscured by variance that makes those patterns invisible to the people executing them.

This variance is expensive. Every unnecessary variation in process introduces opportunities for decisions that should be correct but are not. These inaccuracies compound over time and across volume in ways that are rarely measured and almost never attributed to process design.

The opportunity in cognitive automation is not to automate your current process. The opportunity is to use AI to measure what your current process actually produces, discover where the variance is noise rather than signal, and design a process that produces higher decision accuracy before you automate anything.

This takes longer than buying software and pointing it at your documents. It is politically harder because it requires discovering uncomfortable truths about current performance. It does not produce impressive demos in 90 days.

But it appears to be the only approach that produces results that compound over time rather than plateau after initial enthusiasm fades.

Why Decision Accuracy Is Systematically Overestimated

When management asks departments about decision accuracy, the typical answer is: "We're running at about 98%, maybe 97% in a bad quarter."

This estimate is almost always high, not because people are dishonest but because the measurement approach has structural biases.

It measures only detected inaccuracies. If a payment is approved for the wrong amount but no one complains, it counts as correct. If a compliance check is skipped but the case is not audited, it counts as correct. If a contract clause is misinterpreted but no dispute arises, it counts as correct.

It is self-reported by the people making the decisions. An employee who believes their decisions are correct 98% of the time will report 98% even if measurement would reveal 89%.

It conflates process compliance with decision quality. "We followed the procedure" is treated as equivalent to "We made the right decision." But if 800 people interpret the procedure 800 different ways, compliance tells you nothing about correctness.

It averages across populations without understanding the distribution. If 70% of employees achieve 97% accuracy and 30% achieve 80% accuracy, the average is 92%. But management typically estimates based on the 70% they interact with most.

These biases compound. Our experience suggests management estimates of decision accuracy are typically optimistic by 5 to 15 percentage points. Not occasionally. Systematically.

The parallel to pension fund management is exact. Pension actuaries are very good at calculating the consequences of assumptions. They are systematically poor at making the assumptions, particularly on the factors that matter most. Decision accuracy estimation suffers from the same structural problem: the people doing the estimating are rewarded for consensus, not for uncomfortable accuracy.

You cannot improve what you have not measured honestly. And you cannot measure honestly using self-reported estimates from the people whose performance is being measured.

The Irreversible Nature of Process Automation

Once an organization automates a process without first measuring decision accuracy, reversing course is extraordinarily difficult.

The project team has invested months. The vendor has delivered a system meeting technical specifications. Processing speeds have improved. The system extracts data from documents with 98% accuracy. The project is declared successful.

But suppose decision accuracy (paying the right amount, onboarding the right customers, approving the right claims) has not improved. How would you know? The measurement framework captured document processing, not decision quality. The baseline was never established.

You could stop and measure now, but this requires acknowledging that the initial implementation may have optimized the wrong thing. Project teams do not volunteer this analysis. Vendors do not suggest it. Department heads will not advocate for measurement that might reveal their initial estimates were wrong.

So the system continues. It becomes embedded. The organization moves on. And the decision accuracy, which may be far lower than believed, continues producing losses that disperse across departments and never get attributed to the automation project.

This is not because anyone acted in bad faith. It is because automation decisions, once made, create constituencies that have invested in their success and face political costs from admitting the approach was flawed.

The practical lesson: Measure decision accuracy before automating. Understand what you are actually getting right and wrong. Then design a process that produces high accuracy, and automate that.

The Vendor Selection Problem

Suppose 1,000 companies implement cognitive automation in 2025. After one year, perhaps 250 show strong results: processing speeds increased, accuracy metrics met, efficiency targets exceeded. These become the case studies vendors present at conferences.

But if you could measure five-year outcomes, you might find that only a small fraction (perhaps 25 of the original 1,000) achieved results that compounded. The rest plateaued or regressed once initial enthusiasm faded.

The difficulty is that in year one, the 25 that will succeed long-term look identical to the 225 that succeeded short-term due to favorable conditions. Both have positive metrics. Both have satisfied management.

The differentiator is not visible in the technology, the implementation timeline, or the initial results. It is visible in what was measured before implementation began.

The organizations that succeed long-term almost invariably started by measuring actual decision accuracy and discovering it was lower than believed. This discovery, uncomfortable as it was, allowed them to redesign the process before automating.

The organizations that succeeded short-term typically started by automating their current process and measuring efficiency gains. They optimized what they had. When you optimize a process that makes correct decisions 89% of the time, you end up with a very efficient system for making wrong decisions.

After five coin flips, you would expect to have 31 coin managers with uniformly successful records who, with their abilities confirmed in the marketplace, would write about their methodology. This is not cynicism. It is arithmetic.

By the time the difference becomes apparent (usually 18 to 36 months in) the initial decisions are effectively irreversible. You cannot go back and measure the baseline you did not measure at the start. You cannot redesign the process you have already automated.

This is why vendor selection based on case studies and year-one results is so difficult. The sample is contaminated with lucky short-term winners who will regress. And the methodology that produces long-term success (start by measuring decision accuracy) does not produce impressive demos quickly.

I do not have a solution to this problem. I am pointing out that it exists.

The Coming Verification Crisis

In Germany in the early 1920s, the great inflation made almost all past investments worthless. Contractual promises that seemed solid when made became impossible to honor.

I see a parallel risk in cognitive automation, though the mechanism is different. The risk is not that AI becomes too powerful. The risk is that AI makes everything unverifiable.

When AI-generated content becomes indistinguishable from authentic content, how do you verify anything? A supplier sends an invoice: is it authentic? You receive a contract amendment: was it really sent by your counterparty or fabricated by someone with access to your communication patterns? A regulatory audit requires documentation of decisions made three years ago: how do you prove your records are authentic rather than retroactively reconstructed?

In a world where perfect forgery costs almost nothing, authentic documentation with verifiable provenance becomes extremely valuable. But only if the provenance was built in from the beginning.

Retrofitting verification into systems designed without it is like adding earthquake protection after construction. Possible in theory. Prohibitively expensive in practice. Never as reliable as building it in from the start.

Some organizations are building verification chains into their systems now: records of document origin, processing steps, decision points. This adds perhaps 10 to 15% to development cost. Not building it in, then discovering you need it later, typically costs two to four times the original implementation.

Learn more about verification and validation in decision systems and why measurement must happen early.

I may be wrong about this risk. My views on the verification crisis are more pessimistic than most. The reasoning depends on institutional judgments more than technical analysis. But the cost of being wrong seems asymmetric. Building verification in when it turns out unnecessary wastes modest resources. Not building it in when it turns out necessary makes your systems legally indefensible.

Like the inflation scenario for pensions, this earthquake risk is easier to dismiss than to prepare for.

An Unconventional Approach

Over ten years we have developed an approach that differs from typical vendors, not because we are smarter but because we operate under different constraints.

We are a German Mittelstand company, bootstrapped, with no venture capital pressure for rapid growth or near-term exits. This creates different incentives.

We decline projects where clients are not ready to measure actual decision accuracy first. The likely outcome would be implementation of a system that processes documents faster while the underlying decision accuracy (and the losses it produces) remains unchanged. Everyone declares success based on speed metrics. Nothing fundamental improves.

We invest in education disproportionate to immediate return. We have published over 900 articles and built more than 20 free tools. Roughly 300,000 people used them in 2025. The ROI is difficult to measure. But clients who find us this way arrive with better questions, having usually attempted implementations that disappointed them and now asking why.

We focus on problems requiring both technical depth and domain knowledge: vehicle documentation archiving where compliance requirements extend decades, construction compliance where regulations span jurisdictions, order-to-pay verification where the question is not whether documents are processed but whether decisions are correct. These are not large markets individually. They are profitable markets with clients who understand that cheap solutions are expensive when measured over time.

We maintain analysis of the cognitive automation market at idp-software.com: approximately 300 vendors, something that would have been impossible before AI made systematic comparison feasible. The primary insight from this analysis: vendors publish accuracy metrics, but "accuracy" has no standard definition. Comparing vendors by claimed accuracy is like comparing investment returns without knowing whether they are measuring pre-tax, post-tax, nominal, or real.

The straightforward question is: who is this not suited for?

If your procurement requires responses to 147 specifications, we will not respond. If you need 80 slides and a demo by Friday, we cannot help. If your timeline is "by quarter-end because we promised the board," we are the wrong choice.

We work with organizations that begin by measuring what they are getting wrong, not organizations that begin by specifying what they want to buy.

What We Have Learned About Expertise

In qualitatively analyzing decision patterns over the past decade, we have noticed something about expertise that surprised us.

The highest-paid specialists often have similar decision accuracy to junior staff. They are simply more confident. We have seen this across legal review, compliance assessment, underwriting, and payment approvals. The correlation between seniority and accuracy is weak. The correlation between confidence and accuracy is nearly zero. The correlation between willingness to measure and accuracy is strong.

Real expertise is not certainty about being right. It is comfort with measurement that might show you are wrong.

This describes our organization as much as our clients. Until recently we measured our success by documents processed and extraction accuracy. We believed clients processing millions of documents with 97% extraction accuracy represented success.

Then a client noted their contract disputes had not decreased despite automation. When we measured, disputes had increased slightly. The system processed documents faster and extracted text accurately, but made wrong decisions faster. Nobody had measured whether the right decisions were being made.

This was uncomfortable to discover. But it changed what we measure. We now start by asking: What decisions does this document trigger, and what percentage are currently correct? Technology questions come second.

Looking Forward

As 2025 closes, several things have become clear.

Text extraction technology is commoditized. Multiple vendors offer equivalent capabilities on standard uses. The differentiator is not the AI. It is whether you measure actual decision accuracy and process variance before automating, or whether you automate your current process and hope for the best.

For 2026, what matters:

Helm & Nagel

2026 Priorities

What Management Should Focus On

Measure decision accuracy before automating anything, not document processing accuracy
Understand that process variance compounds over time; standardize to remove noise, not judgment
Build verification into systems now, not later; retrofitting costs 2-4x the original implementation
Measure success over years, not quarters; standardization compounds like earnings compound

First, measure decision accuracy before automating anything. Not document processing accuracy. Decision accuracy. This is uncomfortable because the answer is usually worse than believed. But you cannot improve what you do not measure honestly.

Second, understand that process variance compounds over time. If thousands of employees make decisions using thousands of methods, you pay the cost of that variance daily in decisions that should be correct but are not. Standardizing what can be standardized is not about removing judgment. It is about removing noise so judgment can be applied where it adds value.

Third, build verification into systems now, not later. Whether the verification crisis arrives soon or never, the cost of building it in is modest and the cost of retrofitting is prohibitive.

Fourth, measure success over years, not quarters. Process standardization compounds like earnings compound. Organizations that achieve high standardization over five years will have fundamentally different capabilities. But this will not be visible in first-quarter metrics.

The most expensive decisions are the ones you believe are correct but are not. The most valuable technology is the kind that shows you the difference.

We spent ten years learning this. Perhaps you can learn it faster.

Best regards for a successful 2026,

Christopher Helm CEO, Helm & Nagel GmbH

The Manufacturing Parallel That Knowledge Work Has Ignored

Why Decision Accuracy Is Systematically Overestimated

The Irreversible Nature of Process Automation

The Vendor Selection Problem

The Coming Verification Crisis

An Unconventional Approach

What We Have Learned About Expertise

Looking Forward

What Management Should Focus On

Ready to automate?

Related Articles

AI Selection: Choosing the Right Tools

The Validation Gap: IDP Community opinion piece