On 11 March 2025, the European Commission published the third draft of the General-Purpose Artificial Intelligence (AI) Code of Practice and opened a consultation until 30 March 2025. The Testing requirement, embedded in the Safety and Security section for GPAISRs, mandates providers to conduct rigorous model evaluations using state-of-the-art testing methods aligned with Articles 55 and 56 AI Act. Evaluation techniques include benchmarks, adversarial testing, simulations, human uplift studies, and red-teaming, designed to assess systemic risks, model capabilities, propensities, and unintended behaviours. Providers must conduct evaluations throughout the model lifecycle and adapt testing intensity to the assessed level of systemic risk. Results must be comparable against pre-defined risk acceptance criteria and documented in the Safety and Security Framework. Rigorous quality control standards are required, equivalent to those used in peer-reviewed scientific domains, to ensure test reliability and reproducibility.
Original source