On 30 March 2025, the European Commission closes its consultation on the third draft of the General-Purpose Artificial Intelligence (AI) Code of Practice. The draft imposes a Testing requirement under the Safety and Security section for GPAISRs, requiring model evaluations that are rigorous, state-of-the-art, and proportionate to systemic risk. Providers must conduct evaluations using benchmarks, simulations, red-teaming, human uplift studies, and other adversarial techniques throughout development and deployment. These tests must assess model capabilities, propensities, and systemic risk scenarios and be aligned with pre-defined acceptance criteria. Evaluation procedures must meet the scientific quality standards equivalent to leading machine learning conferences and journals, with SME-specific exceptions requiring agreement with the AI Office. Testing outcomes must inform systemic risk indicators and be documented to ensure traceability and accountability.
Original source