본문 내용으로 건더뛰기

KDI 경제교육·정보센터

ENG
  • 경제배움
  • Economic

    Information

    and Education

    Center

국제금융
How (un)Stable Are LLM Occupational Exposure Scores? Evidence from Multi-Model Replication
NBER
2026.06.26
A rapidly growing literature estimates AI‘s labor-market effects using large language models (LLMs) to self-assess occupational exposure. We demonstrate these measures are highly fragile. Replicating the dominant rubric with three frontier models on identical tasks, we find a 3.6-fold divergence in mean exposure with agreement as low as 57%. This measurement instability alters downstream empirical conclusions: in a difference-in-differences framework, individual-level coefficient magnitudes vary 2.4-fold across annotators, and county level estimates flip from a significant negative to an insignificant positive depending on annotators. We formalize this non-classical measurement error, highlighting the risks of treating evolving LLMs as static instruments.