Artificial intelligence (AI) systems demonstrate deep knowledge across a broad variety of scientific domains, including biology and chemistry, and bad actors could misuse some of these systems to develop biological or chemical weapons.
The constant development of more-capable models necessitates rapid evaluation mechanisms for governments to respond to emerging security risks in a timely manner. Policymakers, industry experts, and third-party evaluators lack a cohesive standard for testing AI systems‘ safety levels. These challenges complicate efforts to determine the degree to which frontier AI systems pose biological or chemical risks.
The authors evaluate the utility of misusing frontier AI systems to these ends. The authors focus on custom-tuned versions of open-weight AI models that can be modified to remove safety guardrails and/or potentially increase biological capabilities. For this report, the authors evaluated 39 of the most-capable models (as of May 2025) against six public biological and chemical knowledge benchmarks and two refusal benchmarks relevant to biological and chemical threats.