AI Governance Institute logo
AI Governance Institute

Practical Governance for Enterprise AI

Safety & Reliability
SAF · Safety & ReliabilitySAF-001High effortAgent-relevant

Hallucination Detection and Mitigation

Implement controls to detect, reduce, and manage AI-generated factual errors and fabrications before they reach end users or inform decisions.

Objective

Reduce the risk of harm from AI-generated misinformation by applying both preventive and detective controls for hallucinated content.

Maturity Levels

1

Initial

No hallucination controls exist; outputs are used without verification.

2

Developing

Users are warned about hallucination risk but no technical controls are in place.

3

Defined

Retrieval-augmented generation or factual verification steps are applied to high-risk use cases; human review is required for factual claims.

4

Managed

Hallucination rates are tracked through sampling; high-hallucination use cases are flagged for additional controls.

5

Optimizing

Automated factual consistency checking is applied at inference time; hallucination reduction is a quantified model selection criterion.

Evidence Requirements

What an auditor or assessor would expect to see for this control.

  • Hallucination detection configuration and evaluation dataset documenting test cases and pass thresholds
  • Periodic hallucination rate reports showing metric values over time for production outputs
  • Human fact-checking spot-check records for a sample of high-stakes outputs in a defined period
  • Citation or source attribution records for RAG systems showing retrieved evidence links to each claim
  • User feedback or correction records showing hallucinations detected post-deployment and how they were handled

Implementation Notes

Key steps

  • Use retrieval-augmented generation (RAG) for factual use cases: ground model responses in verified source documents rather than relying on parametric knowledge.
  • Implement citation requirements: require the model to cite sources for factual claims, making hallucinations easier to detect.
  • Apply verification steps for high-stakes factual outputs: legal documents, medical information, financial data — these require human expert review, not just automated checking.
  • Measure hallucination rates per use case using sampling — the rate varies significantly across domains and prompt types.

Example Implementation

Legal research tool using RAG to answer questions about case law and statutes

Hallucination Controls — Legal Research Assistant

Prevention (RAG architecture):

  • All responses grounded in retrieved source documents from verified legal database (Westlaw + internal case files)
  • Model instructed to answer only from retrieved context; if context is insufficient, respond with "Insufficient sources found" rather than generating from parametric knowledge
  • Maximum 3 source documents per response; each claim attributed to a specific source

Detection (applied to every response):

  1. Citation verification: cited case names and statute numbers checked against database before delivery — mismatched citations block the response
  2. NLI-based consistency check: claims in response compared to source chunks using a fine-tuned entailment classifier; entailment score < 0.7 triggers human review flag
  3. Human review required for: responses to questions about specific case outcomes, regulatory deadlines, and monetary thresholds

Hallucination rate tracking: Weekly sampling — 50 responses reviewed by a paralegal; target < 2% hallucination rate; current rate: 1.4%

User disclosure: Every response includes "Verify with primary sources before relying on this output for legal advice."

Control Details

Control ID
SAF-001
Typical owner
AI Engineering / AI Governance Team
Implementation effort
High effort
Agent-relevant
Yes

Tags

hallucinationfactual accuracyRAGgenerative AI safety