Why Statisticians and Control Engineers Disagree About AI Hallucination
Wiley Jones, CEO & Co-Founder at Doss
When Wiley Jones, CEO and co-founder of Doss — the $55M-backed company rebuilding enterprise operations with AI-native architecture — told me hallucination is just a selection error, my first reaction was: that doesn’t make the problem go away. I’m a statistician by training. Prediction is a continuous spectrum. There’s always a probability distribution. Reframing an error doesn’t eliminate it.
But the more I sat with his argument, the more I realized the disagreement isn’t about whether errors happen. It’s about what you do about them. And that distinction changes how you build.
The hardest version of the pushback.
Here’s the challenge that kept nagging me: what about open-ended generation? You can’t always drop outputs into discrete buckets. You can’t always build a control system around them.
Take an example: if an AI is writing something statistical and says “the difference between two binomial random variables is normally distributed,” you need domain expertise to catch that. It’s approximately normal by CLT for large n, but strictly wrong. A generic LLM grading another LLM likely misses it because the grader shares the generator’s weaknesses.
So where does the selection error framing help here? This is where Jones’ claim turns out to be more specific — and more useful — than it sounds on first listen.
Two disciplines, two intervention strategies.
Statistics and control theory aren’t competing explanations of error. They’re competing intervention strategies.
The statistician sees a 5% error rate and asks: how do I reduce it? Better training data, better model, better alignment. The goal is minimizing expected error. The control theorist sees the same 5% and asks: how do I build a system that works despite it?
Jones was blunt about this on our recording: “It’s been really funny watching software engineers re-derive control theory from first principles. They’re like, ‘what if we took the errors and fed them back in?’ Yes, we discovered this 60 years ago. 200 years ago, the Russian scientists figured out stable control systems and robustness and how you can create bounded outputs on unbounded inputs.”
That phrase — “bounded outputs on unbounded inputs” — is a formal concept in control theory called BIBO stability. The system stays in an acceptable range even when the inputs are noisy. He’s not borrowing a metaphor. He’s using the actual math.
The decomposition move.
So what about my advisor’s statistics example? A control-theorist wouldn’t architect “help me reason about statistical distributions” as “prompt an LLM to write about statistics.” They’d decompose it.
Step 1: Classify the question (“what’s the distribution of X - Y where X, Y are binomial?”). Bounded output. LLM is fine here. Step 2: Retrieve or compute the answer from a symbolic system — SymPy, Mathematica, a theorem database. Deterministic. Step 3: Use the LLM only to wrap the deterministic result in natural language. The factual claim is pinned to the symbolic output, not the LLM’s memory.
Each step’s output is bounded. The LLM is present but isn’t the truth source. The hallucination my advisor worried about can’t occur, not because you detected it, but because you removed the degree of freedom that allowed it.
Jones’ actual claim is narrower than it sounds: most “hallucination problems” are self-inflicted by using open-ended generation where structured decomposition would work. And for genuinely irreducible open-ended generation where you can’t decompose? The honest answer: it’s a human-in-the-loop tool, not an autonomous system. That’s the frame drawing its own boundary.
Why the 0.95^10 calculation is incomplete.
Researchers often cite compounding error: ten steps at 95% accuracy gives you ~60% end-to-end success. That math is correct but it’s a statistician’s calculation that assumes independence. In agentic workflows, errors aren’t independent. Step 2’s wrong output becomes step 3’s input. Step 3 isn’t in a 5% error regime anymore — it’s operating off-manifold where its actual error rate might be 50%.
Control theory asks a different question: is the system stable or unstable? Two pipelines with identical per-step error rates can have radically different end-to-end behavior. A stable system detects deviation and corrects. An unstable system amplifies deviation until it diverges.
“If you break these possible set of options apart and separate them across the state space,” Jones described, “you’ve now created a bunch of very finely guided systems. And I think this system will eventually converge towards governed optimal behavior. And again, this is control theory.”
The one-line version: statistics optimizes a model. Control theory designs a system that tolerates its model. Mechanical engineers build reliable things out of unreliable parts every day. Every resistor has tolerance, every bearing wears, turbulence is unpredictable. But a jet engine works because the system is designed for component failure, not against it.
The question for agent builders isn’t how to eliminate hallucination. It’s whether your architecture is self-correcting or open-loop.
FAQ
How do statisticians and control engineers differ on AI hallucination?
Statisticians treat hallucination as a model accuracy problem — reduce error rate through better data, training, and alignment. Control engineers treat it as a system design problem — assume permanent error and build detection, correction, and feedback loops around the model. Both are valid at different layers; most AI teams over-index on the statistical approach.
What is BIBO stability and how does it apply to AI systems?
BIBO (Bounded-Input, Bounded-Output) stability means a system produces outputs within acceptable ranges even when inputs are noisy or unpredictable. Applied to AI, it means designing agent architectures where component-level errors (hallucinations) don’t cause system-level failure — through output constraints, feedback loops, and competing heuristics that converge.
How does Doss handle AI reliability in enterprise operations?
Doss constrains every AI decision point to discrete, bounded outputs with measurable feedback. The system sets competing optimization targets — accuracy baselines plus resolution time limits — and converges toward governed optimal behavior through control-theoretic feedback loops. High-risk operations like order processing flag changes for human review automatically.
What is the decomposition approach to preventing hallucination?
Instead of asking an LLM to generate open-ended answers, break tasks into bounded steps: classify the question (LLM is fine), compute or retrieve the factual answer from a deterministic source (database, symbolic solver), then use the LLM only to present the result in natural language. The factual claim is pinned to a non-LLM truth source.
Can you build reliable AI agents if each step has a 5% error rate?
The standard calculation (0.95^10 = 60% success) assumes independent errors, but agent pipeline errors are correlated — each step’s output becomes the next step’s input. Control theory reframes this: the question isn’t per-step accuracy but whether the system architecture is stable (self-correcting) or unstable (error-amplifying). Architecture determines reliability more than component accuracy.
What industries does Doss serve with its AI-native operations platform?
Physical operations companies — retailers, food and beverage brands, consumer goods manufacturers — managing inventory, procurement, orders, and multiple sales channels. Customers include Verve Coffee, Eight Sleep, and Mezcla. The platform targets mid-market companies with $20M-$250M revenue, implementing in 3-4 months versus 12-24 months for traditional ERP.
What is the sandwich theory of determinism in AI system design?
Concrete inputs on one end, concrete outputs on the other, and freedom for the AI to work however it needs in the middle. You optimize the middle for speed and cost, not correctness. Correctness is enforced at the boundaries. This approach builds deterministic system behavior from probabilistic model components.
Why do software engineers keep re-deriving control theory for AI?
Most AI engineers come from ML and statistics backgrounds, not electrical or mechanical engineering. Control theory — which has solved bounded-output problems for over 200 years — isn’t in their training. Concepts like feedback loops, error propagation, convergence proofs, and BIBO stability are well-established in engineering disciplines but being rediscovered ad hoc in the AI agent community.
When should you use a human in the loop instead of autonomous AI?
When the output space is genuinely unbounded and you can’t decompose the task into discrete bounded steps. Open-ended generation in high-stakes domains — statistical writing, legal analysis, medical summaries — where errors require domain expertise to catch. The control-theoretic frame’s honest boundary: if you can’t bound the output, don’t pretend the system is reliable.
Full episode coming soon
This conversation with Wiley Jones is on its way. Check out other episodes in the meantime.
Visit the ChannelMore from Wiley Jones
Founder Archetype
Read Wiley Jones's archetype profile
The Sage · Classical: Daedalus · Tests & Allies