본문 내용으로 건더뛰기

KDI 경제교육·정보센터

ENG
  • 경제배움
  • Economic

    Information

    and Education

    Center

최신자료
Toward Comprehensive Benchmarking of the Biological Knowledge of Frontier Large Language Models
RAND
2025.11.28
Artificial intelligence (AI) systems demonstrate deep knowledge across a broad variety of scientific domains, including biology and chemistry, and bad actors could misuse some of these systems to develop biological or chemical weapons.

The constant development of more-capable models necessitates rapid evaluation mechanisms for governments to respond to emerging security risks in a timely manner. Policymakers, industry experts, and third-party evaluators lack a cohesive standard for testing AI systems‘ safety levels. These challenges complicate efforts to determine the degree to which frontier AI systems pose biological or chemical risks.

The authors evaluate the utility of misusing frontier AI systems to these ends. The authors focus on custom-tuned versions of open-weight AI models that can be modified to remove safety guardrails and/or potentially increase biological capabilities. For this report, the authors evaluated 39 of the most-capable models (as of May 2025) against six public biological and chemical knowledge benchmarks and two refusal benchmarks relevant to biological and chemical threats.