Hallucination Detection and Mitigation
Implement controls to detect, reduce, and manage AI-generated factual errors and fabrications before they reach end users or inform decisions.
Objective
Reduce the risk of harm from AI-generated misinformation by applying both preventive and detective controls for hallucinated content.
Maturity Levels
Initial
No hallucination controls exist; outputs are used without verification.
Developing
Users are warned about hallucination risk but no technical controls are in place.
Defined
Retrieval-augmented generation or factual verification steps are applied to high-risk use cases; human review is required for factual claims.
Managed
Hallucination rates are tracked through sampling; high-hallucination use cases are flagged for additional controls.
Optimizing
Automated factual consistency checking is applied at inference time; hallucination reduction is a quantified model selection criterion.
Evidence Requirements
What an auditor or assessor would expect to see for this control.
- —Hallucination detection configuration and evaluation dataset documenting test cases and pass thresholds
- —Periodic hallucination rate reports showing metric values over time for production outputs
- —Human fact-checking spot-check records for a sample of high-stakes outputs in a defined period
- —Citation or source attribution records for RAG systems showing retrieved evidence links to each claim
- —User feedback or correction records showing hallucinations detected post-deployment and how they were handled
Implementation Notes
Key steps
- Use retrieval-augmented generation (RAG) for factual use cases: ground model responses in verified source documents rather than relying on parametric knowledge.
- Implement citation requirements: require the model to cite sources for factual claims, making hallucinations easier to detect.
- Apply verification steps for high-stakes factual outputs: legal documents, medical information, financial data — these require human expert review, not just automated checking.
- Measure hallucination rates per use case using sampling — the rate varies significantly across domains and prompt types.
Example Implementation
Legal research tool using RAG to answer questions about case law and statutes
Hallucination Controls — Legal Research Assistant
Prevention (RAG architecture):
- All responses grounded in retrieved source documents from verified legal database (Westlaw + internal case files)
- Model instructed to answer only from retrieved context; if context is insufficient, respond with "Insufficient sources found" rather than generating from parametric knowledge
- Maximum 3 source documents per response; each claim attributed to a specific source
Detection (applied to every response):
- Citation verification: cited case names and statute numbers checked against database before delivery — mismatched citations block the response
- NLI-based consistency check: claims in response compared to source chunks using a fine-tuned entailment classifier; entailment score < 0.7 triggers human review flag
- Human review required for: responses to questions about specific case outcomes, regulatory deadlines, and monetary thresholds
Hallucination rate tracking: Weekly sampling — 50 responses reviewed by a paralegal; target < 2% hallucination rate; current rate: 1.4%
User disclosure: Every response includes "Verify with primary sources before relying on this output for legal advice."
Control Details
- Control ID
- SAF-001
- Domain
- Safety & Reliability
- Typical owner
- AI Engineering / AI Governance Team
- Implementation effort
- High effort
- Agent-relevant
- Yes
