Question 31 of 34

How do we audit an AI system for compliance?

A methodology for conducting compliance audits of individual AI systems — what to review, what evidence to collect, and how to write findings that actually drive remediation.

If you only do 3 things, do this:

1.Pull the documentation package first. If it's incomplete or outdated, that's your first finding — everything else is harder to evaluate without it.
2.Test the controls, don't just read them. Interview reviewers to find out if oversight actually operates as documented. Check override rates. Review monitoring logs. Documentation that describes controls that aren't operating is worse than no documentation.
3.Rate findings by severity and assign remediation owners before you leave the audit. Findings without owners and timelines rarely get fixed.

The Situation

Who this is for: Internal audit, compliance, and risk teams conducting AI-specific reviews

When you need this: Annually for high-risk systems, before major model updates, or after an incident that raises questions about a system's governance

The Decision

Does this AI system's actual behavior, documentation, and controls meet the standards we've committed to, and what needs to change?

The Steps

1Pull the model registry entry and documentation package; verify both are current and complete
2Review the risk assessment: is the risk tier still accurate given the current use case and deployment context?
3Audit the data governance record: verify training data provenance, legal basis, and PII assessment
4Review bias testing results: when was the last test run, what did it find, was it remediated and re-tested?
5Test human oversight: interview reviewers, inspect override rates, review a sample of decision log entries
6Check monitoring: is the post-deployment monitoring plan operating? Review recent alerts, responses, and any incidents
7Produce an audit report with findings rated by severity, remediation owners assigned, and a follow-up review scheduled

The Artifacts

—AI compliance audit scope and methodology template
—Document review checklist (required documents by risk tier)
—Bias testing verification worksheet
—Human oversight interview guide (questions for reviewers)
—Monitoring operations checklist
—AI audit findings template (finding, evidence, severity, owner, timeline)

The Output

A completed audit report with findings rated by severity, named remediation owners, and a follow-up review scheduled to verify remediation.

What to look for in an AI compliance audit

AI compliance audits have three layers: documentation, controls, and outcomes. Documentation review asks whether the required records exist, are current, and accurately describe the system. Controls testing asks whether the documented controls are actually operating as described. Outcome review asks whether the system is producing results consistent with its stated purpose and without unauthorized disparate impacts.

Auditors who only review documentation miss the most important layer. A model card that accurately described a system at deployment may no longer reflect the current version. A human oversight process that looked strong on paper may have devolved into rubber-stamping. Override rates that were healthy at deployment may have declined as reviewers became habituated. Controls testing requires talking to people and looking at data, not just reading documents.

The evidence you need and where to find it

For documentation: the model registry entry (to confirm the system is inventoried and risk-tiered), the model card (to review training data, performance metrics, and known limitations), the risk assessment (to confirm it predates deployment and is current), and the bias assessment (to review methodology, findings, and remediation history).

For controls: the decision log (sample recent entries for completeness and integrity), monitoring dashboards or reports (review the last 90 days for alerts, anomalies, and responses), override rate data (pull by reviewer and by system over the review period), and vendor change notifications (verify the system receives and acts on model update notices).

For oversight: interview two or three human reviewers. Ask them: what information do they see when reviewing an AI recommendation, have they ever overridden one, and what would cause them to override. Their answers will tell you more about the effectiveness of your oversight process than any documentation.

Writing findings that drive remediation

Findings should state what was expected, what was observed, the evidence that supports the finding, and why it matters. A finding that says "documentation was incomplete" is less useful than one that says "the model card has not been updated since the model was retrained in March 2025; the current card does not reflect the updated training data or the bias testing conducted post-retraining, creating a gap in the audit trail for decisions made since that date."

Severity ratings should reflect actual risk, not just process compliance. A missing signature on a form is different from an operating control that is not functioning. Reserve high-severity ratings for findings that represent meaningful risk: unsupported high-risk deployments, oversight processes with documented non-compliance, or bias findings that were not remediated. Rate everything else proportionately. Severity inflation causes organizations to ignore findings, which defeats the purpose of the audit.

Related frameworks

NIST AI RMF ISO 42001:2023 EU AI Act

← What AI documentation do we actually need?How do we manage third-party AI vendors safely throughout the vendor lifecycle? →