📊LLM Leaderboard

主流大模型评测排行榜

🔍

评测基准

5

模型数量

50

厂商数量

10

数据来源

Blob + Arena + LiveBench

🏆

Arena.ai - Text

Chatbot Arena (Text)

基于人类评估的AI对话系统排行榜(ELO评分),数据来源:arena.ai

排名模型厂商分数票数输入/输出上下文
🥇Claude Opus 4.6 ThinkingAnthropic150110,754$5/$251000K
🥈Claude Opus 4.6Anthropic150111,577$5/$251000K
🥉Gemini 3.1 Pro PreviewGoogle149313,473$2/$121000K
#4Grok 4.20 Beta1xAI14926,913$5/$15200K
#5Gemini 3 ProGoogle148640,857$2/$12200K
#6GPT-5.4 HighOpenAI14854,930$5/$201000K
#7Grok 4.20 Beta-0309 ReasoningxAI14823,398$5/$15200K
#8GPT-5.2 Chat LatestOpenAI14808,887$2.5/$10500K
#9Gemini 3 FlashGoogle147430,516$0.5/$31000K
#10Claude Opus 4.5 Thinking 32kAnthropic147335,344$5/$25200K
💻

Arena.ai - Code

Chatbot Arena (Code)

编程能力评估排行榜(ELO评分),数据来源:arena.ai

排名模型厂商分数票数输入/输出上下文
🥇Claude Opus 4.6Anthropic15493,893$5/$251000K
🥈Claude Opus 4.6 ThinkingAnthropic15473,054$5/$251000K
🥉Claude Sonnet 4.6Anthropic15185,204$3/$151000K
#4Claude Opus 4.5 Thinking 32kAnthropic149012,973$5/$25200K
#5Claude Opus 4.5Anthropic146813,051$5/$25200K
#6GPT-5.4 High (Codex)OpenAI14561,431$5/$201000K
#7Gemini 3.1 Pro PreviewGoogle14513,865$2/$121000K
#8GLM-5Zhipu AI14473,929$0.72/$2.380K
#9GLM-4.7Zhipu AI14395,134$0.38/$1.98200K
#10Gemini 3 FlashGoogle143713,637$0.5/$31000K
👁️

Arena.ai - Vision

Chatbot Arena (Vision)

视觉理解能力评估排行榜(ELO评分),数据来源:arena.ai

排名模型厂商分数票数输入/输出上下文
🥇Gemini 3 ProGoogle128813,037$2/$12200K
🥈Gemini 3.1 Pro PreviewGoogle12796,186$2/$121000K
🥉GPT-5.2 Chat LatestOpenAI12782,922$2.5/$10500K
#4Gemini 3 FlashGoogle127412,634$0.5/$3200K
#5Dola-Seed 2.0 PreviewByteDance12543,076$0.5/$1.5128K
#6GPT-5.2 HighOpenAI12526,292$5/$20500K
#7GPT-5.1 HighOpenAI12499,375$5/$20200K
#8Gemini 2.5 ProGoogle124881,858$2/$12200K
#9Kimi K2.5 ThinkingMoonshot AI12456,469$0.6/$3200K
#10Claude Sonnet 4.6Anthropic12404,500$3/$151000K
📄

Arena.ai - Document

Chatbot Arena (Document)

文档理解能力评估排行榜(ELO评分),数据来源:arena.ai

排名模型厂商分数票数输入/输出上下文
🥇Claude Opus 4.6Anthropic15244,336$5/$251000K
🥈Claude Sonnet 4.6Anthropic14911,813$3/$151000K
🥉GPT-5.4OpenAI14831,349$2.5/$151000K
#4Claude Opus 4.5Anthropic14736,112$5/$25200K
#5Gemini 3.1 Pro PreviewGoogle14573,972$2/$121000K
#6Claude Sonnet 4.5Anthropic14506,375$3/$15200K
#7Gemini 3 ProGoogle14478,872$2/$12200K
#8GPT-5 ProOpenAI14325,500$10/$30500K
#9MiniMax M2.5MiniMax14252,800$0.2/$1.2200K
#10Kimi K2.5Moonshot AI14185,100$0.6/$3200K

LiveBench

LiveBench

无污染最新评测基准,数据来源:livebench.ai(2026-01-08版)

排名模型厂商分数输入/输出上下文
🥇GPT-5.4 ThinkingOpenAI80.28$2.5/$151000K
🥈Gemini 3.1 Pro PreviewGoogle79.93$2/$121000K
🥉Claude 4.6 Opus ThinkingAnthropic76.33$5/$251000K
#4Claude 4.5 Opus ThinkingAnthropic75.96$5/$25200K
#5Claude 4.6 Sonnet ThinkingAnthropic75.47$3/$151000K
#6GPT-5.2 HighOpenAI74.84$5/$20500K
#7GPT-5.2 CodexOpenAI74.3$2.5/$10500K
#8GPT-5.1 Codex MaxOpenAI73.98$15/$75200K
#9Gemini 3 Pro PreviewGoogle73.39$2/$12200K
#10Kimi K2.5 ThinkingMoonshot AI69.07$0.6/$3200K