Hallucination Detection and Mitigation

Maturity Levels

Initial

No hallucination controls exist; outputs are used without verification.

Developing

Users are warned about hallucination risk but no technical controls are in place.

Defined

Retrieval-augmented generation or factual verification steps are applied to high-risk use cases; human review is required for factual claims.

Managed

Hallucination rates are tracked through sampling; high-hallucination use cases are flagged for additional controls.

Optimizing

Automated factual consistency checking is applied at inference time; hallucination reduction is a quantified model selection criterion.

Evidence Requirements

What an auditor or assessor would expect to see for this control.

—Hallucination detection configuration and evaluation dataset documenting test cases and pass thresholds

—Periodic hallucination rate reports showing metric values over time for production outputs

—Human fact-checking spot-check records for a sample of high-stakes outputs in a defined period

—Citation or source attribution records for RAG systems showing retrieved evidence links to each claim

—User feedback or correction records showing hallucinations detected post-deployment and how they were handled

Implementation Notes

Key steps

Use retrieval-augmented generation (RAG) for factual use cases: ground model responses in verified source documents rather than relying on parametric knowledge.
Implement citation requirements: require the model to cite sources for factual claims, making hallucinations easier to detect.
Apply verification steps for high-stakes factual outputs: legal documents, medical information, financial data — these require human expert review, not just automated checking.
Measure hallucination rates per use case using sampling — the rate varies significantly across domains and prompt types.

Prevention (RAG architecture):

All responses grounded in retrieved source documents from verified legal database (Westlaw + internal case files)
Model instructed to answer only from retrieved context; if context is insufficient, respond with "Insufficient sources found" rather than generating from parametric knowledge
Maximum 3 source documents per response; each claim attributed to a specific source

Detection (applied to every response):

Citation verification: cited case names and statute numbers checked against database before delivery — mismatched citations block the response
NLI-based consistency check: claims in response compared to source chunks using a fine-tuned entailment classifier; entailment score < 0.7 triggers human review flag
Human review required for: responses to questions about specific case outcomes, regulatory deadlines, and monetary thresholds

Hallucination rate tracking: Weekly sampling — 50 responses reviewed by a paralegal; target < 2% hallucination rate; current rate: 1.4%

User disclosure: Every response includes "Verify with primary sources before relying on this output for legal advice."