Insight · AI reporting

Why your AI keeps handing you confident, wrong numbers

By Barry Middlebrook · Middlebrook Data & AI Governance

Ask a modern AI tool "what was Q3 margin in the Western region?" and you'll get an answer in seconds — fluent, formatted, and delivered with total confidence. The unnerving part isn't when it says "I don't know." It's when it gives you a number that looks exactly right and isn't.

For thirty years, a flawed report had a human circuit-breaker: an analyst built it, a manager reviewed it, and a wrong number usually got caught before it reached a decision. AI removes that circuit-breaker. It produces answers instantly, at scale, with no analyst in the loop — which means ungoverned data no longer produces a bad report. It produces a thousand confident, wrong answers before lunch.

It's almost never bad math. It's the wrong definition.

When AI reporting goes wrong, people assume the model "hallucinated." Usually it didn't. The model did exactly what it was asked — it just pulled from data that meant something different than the leader assumed.

The classic failure: three teams define "revenue" three different ways. Finance nets out refunds; sales doesn't; the data warehouse uses a third cut. A human analyst knows which one the CFO means. The AI doesn't. It grabs whichever table it can reach, blends two sources that define "customer" differently, uses a stale extract — and hands you a number that is confidently, untraceably wrong.

AI reporting is only as trustworthy as the data governance beneath it. AI doesn't reduce the need for governance — it makes governance the foundation everything stands on.

The one control that fixes most of it

If you do nothing else, do this: define each metric once, in a governed semantic layer — and point the AI only at that layer, never at raw tables.

A semantic (or metrics) layer is where "revenue," "active customer," and "churn" are defined a single time, in writing, with the formula and an owner. When the AI can only query defined metrics instead of raw SQL, it physically cannot invent a definition. This single move eliminates the most common class of AI-reporting error.

Around that, you add the rest of the foundation — lineage so every number can be traced to its source, data-quality controls, and output governance so each answer carries its provenance (which sources, which definitions, as of when). In regulated finance, that provenance is also what satisfies SOX and the EU AI Act: "the AI said so" is not a defense.

Where to start

You don't boil the ocean. Pick one high-value reporting domain, document its top metrics with canonical definitions, point the AI only at that governed layer, and add a provenance check on the highest-stakes output. Then show a leader an AI answer that cites its sources. That demo sells the whole program.

The first step, though, is knowing where you actually stand — which gaps are quietly waiting to produce that first confident, wrong number.

Is your data ready for AI reporting?

Take the free 4-minute readiness assessment and get your maturity level with prioritized fixes — instantly.

Take the free assessment Or request a full, expert-led assessment →