AI Incidents Rising and Responsible AI Evals Still Rare Among Major Developers, Stanford HAI 2025 Index Finds

Stanford University's Human-Centered Artificial Intelligence institute published the 2025 AI Index Report on April 1, 2025, delivering a comprehensive global analysis of AI research, development, and governance trends. The report documents a measurable increase in AI-related incidents and finds that standardized responsible AI evaluations remain uncommon among major industrial model developers, identifying a persistent gap between organizational acknowledgment of responsible AI risks and concrete remediation steps. The report also highlights emerging benchmarks, including HELM Safety, AIR-Bench, and FACTS, as tools designed to assess model safety and factuality, though adoption of these frameworks across the industry remains limited. On the regulatory side, the report notes accelerated policy output across multiple jurisdictions, citing frameworks from the OECD, European Union, and United Nations that increasingly emphasize transparency and trustworthiness requirements for AI systems.

The report reflects a broader trend in which the AI governance field is transitioning from high-level principle articulation toward more specific and potentially enforceable standards. The documented rise in AI incidents, coupled with the scarcity of standardized RAI evaluations among leading developers, signals that voluntary commitments have not consistently translated into operational practice. Regulators and standard-setting bodies appear to be responding to this gap by moving toward requirements that would obligate organizations to demonstrate, rather than simply assert, the safety and reliability of their AI systems. The convergence of incident data, benchmark development, and regulatory momentum reflected in this report reinforces that ad hoc or informal AI governance approaches carry growing institutional and legal risk.

Enterprise compliance teams should treat this report as a prompt to assess the maturity of their current responsible AI evaluation processes against the benchmarks and frameworks named in the index. Specifically, teams should evaluate whether internal model assessments incorporate structured safety and factuality evaluations aligned with tools such as HELM Safety or AIR-Bench, and should map existing governance documentation against OECD and EU transparency requirements that are moving closer to enforcement. Organizations operating in EU jurisdictions face near-term deadlines under the EU AI Act that make this alignment urgent. Compliance and risk functions should also establish incident tracking mechanisms capable of capturing and escalating AI-related failures in a format that satisfies anticipated audit and regulatory disclosure expectations. The report's findings suggest that regulators and external auditors will increasingly expect evidence of systematic RAI evaluation rather than policy statements alone.

AI Incidents Rising and Responsible AI Evals Still Rare Among Major Developers, Stanford HAI 2025 Index Finds

Jurisdiction

Tags

Related Regulations

Related Playbook