Logo


ByteDance SEED, Multimodal Art Projection, Beihang University,

geometric reasoning

Introduction

KORGym is a dynamic game-based evaluation platform for large language models (LLMs), comprising over 50+ games across six reasoning dimensions: mathematical and logical reasoning, control interaction reasoning, puzzle reasoning, spatial and geometric reasoning, strategic reasoning, and multimodal reasoning. The platform is organized into four modular components—the Inference Module, the Game Interaction Module, the Evaluation Module, and the Communication Module—that collectively support multi-round evaluations, configurable difficulty levels, and robust reinforcement-learning integration .

📖Overview

algebraic reasoning

KORGym is architected as a modular, game-centric evaluation framework comprising three principal components. (1)The Evaluation and Communication Module serves as the system’s core: it interprets and validates input parameters, orchestrates inter-module messaging by encapsulating and dispatching protocol-compliant packets, and records final performance metrics. (2)The Game Interaction Module encapsulates each game environment and exposes a standardized API—generate, which instantiates and configures a new game; print board, which renders the current game state and formulates the corresponding prompt; and verify, which ingests player actions, advances the game state, and computes incremental scores. (3)The Inference Module coordinates model queries by managing asynchronous inference pipelines, parallelizing requests for throughput optimization, and checkpointing intermediate outputs to ensure reproducibility and facilitate error recovery.

🏅Leaderboard

To comprehensively evaluate LLM performance, we assessed 19 large language models—including 11 thinking models and 8 instruction-tuned models—and 8 vision-language models.


Thinking Non-thinking

Segment: overall (Tap to switch to math)

🎮️Game Library

BibTeX


          @misc{KORGym,
          title={KORGym: A Dynamic Game Platform for LLM Reasoning Evaluation}, 
          author={Jiajun Shi and Jian Yang and Jiaheng Liu and Xingyuan Bu and Jiangjie Chen and Junting Zhou and Kaijing Ma and Zhoufutu Wen and Bingli Wang and Yancheng He and Liang Song and Hualei Zhu and Shilong Li and Xingjian Wang and Wei Zhang and Ruibin Yuan and Yifan Yao and Wenjun Yang and Yunli Wang and Siyuan Fang and Siyu Yuan and Qianyu He and Xiangru Tang and Yingshui Tan and Wangchunshu Zhou and Zhaoxiang Zhang and Zhoujun Li and Wenhao Huang and Ge Zhang},
          year={2025},
          eprint={2505.14552},
          archivePrefix={arXiv},
          primaryClass={cs.CL},
          url={https://arxiv.org/abs/2505.14552}, 
      }