AI Governance Institute logo
AI Governance Institute

Practical Governance for Enterprise AI

Agentic AI
AGT · Agentic AIAGT-010Medium effortAgent-relevant

Agent Knowledge Source Integrity

Validate that documents, databases, and external sources retrieved by AI agents during task execution have not been tampered with, poisoned, or substituted with adversarial content.

Objective

Prevent agents from acting on manipulated or unauthorized information retrieved from knowledge bases, vector stores, the web, or third-party document sources.

Maturity Levels

1

Initial

Agents retrieve from any available source without validation; no controls exist on retrieval inputs.

2

Developing

Retrieval sources are informally restricted but not validated for integrity; no poisoning detection.

3

Defined

Approved source lists are defined per agent; retrieved documents pass a provenance check before being passed to the model.

4

Managed

Retrieval logs are audited periodically; anomalous retrieval patterns trigger investigation.

5

Optimizing

Automated integrity checks validate source hashes against known-good baselines; adversarial content patterns are detected and blocked in real time.

Evidence Requirements

What an auditor or assessor would expect to see for this control.

  • Approved source registry per agent listing permitted retrieval sources with documented business justification and review date
  • Provenance check records showing retrieved documents were validated against known-good baselines before being passed to the model
  • Retrieval anomaly detection logs and investigation records for flagged retrieval events within a sample period
  • Periodic integrity audit results confirming knowledge base contents against version-controlled reference copies
  • Access control documentation showing who can modify knowledge base contents and under what approval process

Implementation Notes

Key steps

  • Maintain an approved source registry per agent: which URLs, document stores, vector databases, and APIs it may retrieve from; deny all unlisted sources.
  • Hash and version critical knowledge base documents; detect substitution by comparing retrieved content against known-good hashes before the content is passed to the model.
  • Treat retrieved content as untrusted input — apply the same scrutiny you would apply to user-submitted data, not the trust level of a system prompt.
  • Monitor retrieval volume and source patterns; a sudden shift in which documents are retrieved most often, or retrieval from new sources, can signal poisoning.
  • For web-retrieval agents, restrict permitted domains; validate SSL certificates; never follow open redirects to unvalidated endpoints.

Example Implementation

Legal team using a RAG agent to draft compliance summaries from regulatory documents

Knowledge Source Registry — Regulatory Compliance Agent

Permitted sources:

SourceTypeIntegrity CheckUpdate Process
Internal regulatory library (SharePoint)Document storeSHA-256 hash on ingest; re-verified on retrievalChange-controlled; requires legal team approval
EUR-Lex official portalWeb retrievalDomain allowlist only; certificate pinnedAutomated nightly refresh with diff alert
NIST CSRC publicationsWeb retrievalDomain allowlist onlyAutomated nightly refresh with diff alert

Denied by default: All other URLs, document stores, and APIs.

Anomaly alert: Any retrieval from a source not in the registry triggers an immediate alert to AI Engineering and is blocked.

Hash validation failure: If a retrieved document's hash does not match the ingest-time hash, retrieval is rejected and the discrepancy is logged for investigation.

Review cadence: Source registry reviewed quarterly or after any security incident involving the agent.

Control Details

Control ID
AGT-010
Typical owner
AI Engineering / Security / Data Governance
Implementation effort
Medium effort
Agent-relevant
Yes

Tags

RAGknowledge baseretrievaldata poisoningdocument integrity