Question 18 of 34
What is our process for model drift monitoring?
Published by AI Governance Institute · Practical Governance for Enterprise AI
Defining ownership and cadence for ongoing monitoring of deployed AI models to detect performance degradation, behavioral shifts, and emerging bias after deployment.
If you only do 3 things, do this:
- 1.Assign explicit ownership for post-deployment monitoring before the system goes live. Without a named owner, monitoring gets deprioritized every time.
- 2.Define response thresholds in advance: at what drift level is the system flagged for review, and at what level is it paused? Pre-set these — don't determine them in real time when something goes wrong.
- 3.Monitor prediction distributions over time. A model gradually approving or rejecting more of one group is drifting, even if aggregate accuracy looks stable.
The Situation
Who this is for: Data science, ML engineering, and compliance teams responsible for production AI systems
When you need this: Before any AI system goes to production, or when a deployed system's behavior is called into question
The Decision
Do we have the monitoring infrastructure and governance process to detect and respond to model drift before it causes harm or regulatory exposure?
The Steps
- 1Define monitoring metrics for each production system: output distributions, performance, fairness metrics by subgroup, input feature distributions
- 2Set alert thresholds for each metric: yellow (flag for review), red (pause for investigation)
- 3Assign a named monitoring owner for each system and a review cadence proportionate to risk
- 4Implement automated monitoring using your ML infrastructure (MLflow, Evidently, or custom dashboards)
- 5Build escalation protocols: what triggers a pause, who makes the call, what is the response timeline
- 6Define re-training and re-validation criteria: when does drift require a model update vs. a retirement decision
The Artifacts
- —Monitoring metrics specification template (by model type)
- —Alert threshold setting worksheet
- —Monitoring ownership register (system → owner → cadence)
- —Drift response protocol (yellow/red thresholds → escalation path → response actions)
- —Model performance dashboard specification
The Output
A documented monitoring plan for every production AI system, with metrics defined, thresholds set, ownership assigned, and automated alerts in place.
Compliance does not end at deployment
A model that passes all pre-deployment testing may behave differently in production as the world changes around it. Data drift occurs when the statistical properties of the inputs change over time. Concept drift occurs when the relationship between inputs and outputs changes, often because the underlying phenomena the model was trained to predict have evolved. Both can cause a previously compliant model to become inaccurate, biased, or harmful without any change to the model itself.
Regulatory frameworks including the EU AI Act and NIST AI RMF explicitly address post-deployment monitoring requirements. For high-risk systems, the EU AI Act requires post-market monitoring plans and logging of system operation. The regulatory expectation is that governance is ongoing, not a one-time pre-deployment exercise.
What to monitor and how often
Monitor prediction distribution: are the model's outputs shifting over time? A credit scoring model that is approving an increasing proportion of applications, or a hiring model that is rejecting more candidates in a particular category, may be exhibiting drift. Statistical process control methods can identify when output distributions move outside expected ranges.
Monitor input data quality: are the features the model relies on behaving as expected? Missing values, out-of-range inputs, and distribution shifts in key features are early indicators of drift. Monitor performance metrics against ground truth where available, including accuracy, precision, recall, and fairness metrics by subgroup.
Establish monitoring cadences proportionate to risk. High-risk systems in rapidly changing environments may require weekly or even daily monitoring. Lower-risk systems in stable environments may be adequately served by monthly or quarterly reviews.
Ownership and response protocols
Assign explicit ownership for post-deployment monitoring to a named role or team. Without ownership, monitoring activities are consistently deprioritized in favor of new deployments. The owner is responsible for running monitoring checks, reviewing results, escalating anomalies, and initiating retraining or retirement processes when drift is detected.
Define response thresholds in advance: at what level of detected drift does the system get flagged for review? At what level is it paused pending investigation? At what level is it retired? These thresholds should be calibrated to the risk level of the system and documented before deployment, not determined on the fly when something goes wrong.
Governance Controls
Operational controls that implement the guidance in this playbook.
