LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?

Published in Under review at Conference on Neural Information Processing Systems (NeurIPS), 2025, 2025

This paper details a multi-institutional research effort to rigorously benchmark the capabilities of modern LLMs against challenging International Collegiate Programming Contest (ICPC) problems. By having human experts (Olympiad medalists) judge the model outputs, we provide a more nuanced evaluation than simple pass/fail metrics.

Download paper here

Recommended citation: Zihan Zheng, Zerui Cheng, Zeyu Shen, Shang Zhou, Kaiyuan Liu, Hansen He, Dongruixuan Li, Stanley Wei, Hangyi Hao, Jianzhu Yao, Peiyao Sheng, Zixuan Wang, Wenhao Chai, Aleksandra Korolova, Peter Henderson, Sanjeev Arora, Pramod Viswanath, Jingbo Shang, Saining Xie. (2025). “LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?.” arXiv preprint arXiv:2506.11928.

Recommended citation: Zihan Zheng*, Zerui Cheng*, Zeyu Shen*, Shang Zhou*, Kaiyuan Liu*, Hansen He*, et al. (2025). "LiveCodeBench Pro: How Do Olympiad Medalists Judge LLMs in Competitive Programming?." arXiv preprint arXiv:2506.11928.
Download Paper

Share on

Bluesky Facebook LinkedIn X (formerly Twitter)