The European Union has taken the lead in regulating artificial intelligence by passing a risk-based framework earlier this year. Even though the full details of the AI governance regime are still being developed, compliance with the new regulations is already becoming a crucial focus for AI app and model makers.
LatticeFlow AI, a spin out from ETH Zurich, has introduced a new open-source LLM validation framework called Compl-AI to help AI model makers evaluate their compliance with the EU AI Act. This initiative, which is the first of its kind, aims to provide a technical interpretation of the regulations and assess the performance of major LLMs against the law’s requirements.
The evaluation conducted by LatticeFlow’s framework covers various benchmarks such as toxic completions of benign text, prejudiced answers, harmful instructions, truthfulness, and common sense reasoning. While some areas like following harmful instructions show strong performance across all models, others like recommendation consistency indicate significant room for improvement.
One key finding from the evaluation is that most models prioritize capabilities over compliance, leading to challenges in areas like technical robustness, safety, diversity, non-discrimination, and fairness. As the compliance deadlines approach, model makers will need to shift their focus to address these concerns and achieve a more balanced development of LLMs.
LatticeFlow acknowledges that certain aspects of compliance, such as copyright and privacy, present challenges in evaluation. The framework is designed to adapt to updates in the EU AI Act and facilitate ongoing assessments of AI models to ensure their alignment with regulatory requirements.
Moving forward, the framework aims to involve the wider AI research community in improving and expanding the evaluation process. By collaborating with researchers, developers, and regulators, LatticeFlow hopes to create a comprehensive assessment platform that can be applied not only to the EU AI Act but also to future regulatory acts in different jurisdictions.
Overall, the evaluation of major LLMs against the EU AI Act highlights the need for a more compliance-centered approach in AI model development. By addressing performance gaps, enhancing cyberattack resilience, ensuring fairness, and refining benchmarking methodologies, the industry can work towards creating AI technologies that are not only powerful but also safe and compliant with regulatory standards.