How (un)Stable Are LLM Occupational Exposure Scores? Evidence from Multi-Model Replication | 국외연구자료

KDI 경제교육·정보센터

Economic Information and Education Center

국제금융

How (un)Stable Are LLM Occupational Exposure Scores? Evidence from Multi-Model Replication

NBER

2026.06.26

A rapidly growing literature estimates AI‘s labor-market effects using large language models (LLMs) to self-assess occupational exposure. We demonstrate these measures are highly fragile. Replicating the dominant rubric with three frontier models on identical tasks, we find a 3.6-fold divergence in mean exposure with agreement as low as 57%. This measurement instability alters downstream empirical conclusions: in a difference-in-differences framework, individual-level coefficient magnitudes vary 2.4-fold across annotators, and county level estimates flip from a significant negative to an insignificant positive depending on annotators. We formalize this non-classical measurement error, highlighting the risks of treating evolving LLMs as static instruments.

목록보기

KDI 경제교육·정보센터

KDI 경제교육·정보센터

경제정책정보

발행물

경제교육

데이터 분석

멀티콘텐츠

센터소개

연관 사이트

국제금융