🏆
Arena.ai - Text
Chatbot Arena (Text)
基于人类评估的AI对话系统排行榜(ELO评分),数据来源:arena.ai
| 排名 | 模型 | 厂商 | 分数 | 票数 | 输入/输出 | 上下文 |
|---|---|---|---|---|---|---|
| 🥇 | Claude Opus 4.6 Thinking | Anthropic | 1501 | 10,754 | $5/$25 | 1000K |
| 🥈 | Claude Opus 4.6 | Anthropic | 1501 | 11,577 | $5/$25 | 1000K |
| 🥉 | Gemini 3.1 Pro Preview | 1493 | 13,473 | $2/$12 | 1000K | |
| #4 | Grok 4.20 Beta1 | xAI | 1492 | 6,913 | $5/$15 | 200K |
| #5 | Gemini 3 Pro | 1486 | 40,857 | $2/$12 | 200K | |
| #6 | GPT-5.4 High | OpenAI | 1485 | 4,930 | $5/$20 | 1000K |
| #7 | Grok 4.20 Beta-0309 Reasoning | xAI | 1482 | 3,398 | $5/$15 | 200K |
| #8 | GPT-5.2 Chat Latest | OpenAI | 1480 | 8,887 | $2.5/$10 | 500K |
| #9 | Gemini 3 Flash | 1474 | 30,516 | $0.5/$3 | 1000K | |
| #10 | Claude Opus 4.5 Thinking 32k | Anthropic | 1473 | 35,344 | $5/$25 | 200K |
💻
Arena.ai - Code
Chatbot Arena (Code)
编程能力评估排行榜(ELO评分),数据来源:arena.ai
| 排名 | 模型 | 厂商 | 分数 | 票数 | 输入/输出 | 上下文 |
|---|---|---|---|---|---|---|
| 🥇 | Claude Opus 4.6 | Anthropic | 1549 | 3,893 | $5/$25 | 1000K |
| 🥈 | Claude Opus 4.6 Thinking | Anthropic | 1547 | 3,054 | $5/$25 | 1000K |
| 🥉 | Claude Sonnet 4.6 | Anthropic | 1518 | 5,204 | $3/$15 | 1000K |
| #4 | Claude Opus 4.5 Thinking 32k | Anthropic | 1490 | 12,973 | $5/$25 | 200K |
| #5 | Claude Opus 4.5 | Anthropic | 1468 | 13,051 | $5/$25 | 200K |
| #6 | GPT-5.4 High (Codex) | OpenAI | 1456 | 1,431 | $5/$20 | 1000K |
| #7 | Gemini 3.1 Pro Preview | 1451 | 3,865 | $2/$12 | 1000K | |
| #8 | GLM-5 | Zhipu AI | 1447 | 3,929 | $0.72/$2.3 | 80K |
| #9 | GLM-4.7 | Zhipu AI | 1439 | 5,134 | $0.38/$1.98 | 200K |
| #10 | Gemini 3 Flash | 1437 | 13,637 | $0.5/$3 | 1000K |
👁️
Arena.ai - Vision
Chatbot Arena (Vision)
视觉理解能力评估排行榜(ELO评分),数据来源:arena.ai
| 排名 | 模型 | 厂商 | 分数 | 票数 | 输入/输出 | 上下文 |
|---|---|---|---|---|---|---|
| 🥇 | Gemini 3 Pro | 1288 | 13,037 | $2/$12 | 200K | |
| 🥈 | Gemini 3.1 Pro Preview | 1279 | 6,186 | $2/$12 | 1000K | |
| 🥉 | GPT-5.2 Chat Latest | OpenAI | 1278 | 2,922 | $2.5/$10 | 500K |
| #4 | Gemini 3 Flash | 1274 | 12,634 | $0.5/$3 | 200K | |
| #5 | Dola-Seed 2.0 Preview | ByteDance | 1254 | 3,076 | $0.5/$1.5 | 128K |
| #6 | GPT-5.2 High | OpenAI | 1252 | 6,292 | $5/$20 | 500K |
| #7 | GPT-5.1 High | OpenAI | 1249 | 9,375 | $5/$20 | 200K |
| #8 | Gemini 2.5 Pro | 1248 | 81,858 | $2/$12 | 200K | |
| #9 | Kimi K2.5 Thinking | Moonshot AI | 1245 | 6,469 | $0.6/$3 | 200K |
| #10 | Claude Sonnet 4.6 | Anthropic | 1240 | 4,500 | $3/$15 | 1000K |
📄
Arena.ai - Document
Chatbot Arena (Document)
文档理解能力评估排行榜(ELO评分),数据来源:arena.ai
| 排名 | 模型 | 厂商 | 分数 | 票数 | 输入/输出 | 上下文 |
|---|---|---|---|---|---|---|
| 🥇 | Claude Opus 4.6 | Anthropic | 1524 | 4,336 | $5/$25 | 1000K |
| 🥈 | Claude Sonnet 4.6 | Anthropic | 1491 | 1,813 | $3/$15 | 1000K |
| 🥉 | GPT-5.4 | OpenAI | 1483 | 1,349 | $2.5/$15 | 1000K |
| #4 | Claude Opus 4.5 | Anthropic | 1473 | 6,112 | $5/$25 | 200K |
| #5 | Gemini 3.1 Pro Preview | 1457 | 3,972 | $2/$12 | 1000K | |
| #6 | Claude Sonnet 4.5 | Anthropic | 1450 | 6,375 | $3/$15 | 200K |
| #7 | Gemini 3 Pro | 1447 | 8,872 | $2/$12 | 200K | |
| #8 | GPT-5 Pro | OpenAI | 1432 | 5,500 | $10/$30 | 500K |
| #9 | MiniMax M2.5 | MiniMax | 1425 | 2,800 | $0.2/$1.2 | 200K |
| #10 | Kimi K2.5 | Moonshot AI | 1418 | 5,100 | $0.6/$3 | 200K |
⚡
LiveBench
LiveBench
无污染最新评测基准,数据来源:livebench.ai(2026-01-08版)
| 排名 | 模型 | 厂商 | 分数 | 输入/输出 | 上下文 |
|---|---|---|---|---|---|
| 🥇 | GPT-5.4 Thinking | OpenAI | 80.28 | $2.5/$15 | 1000K |
| 🥈 | Gemini 3.1 Pro Preview | 79.93 | $2/$12 | 1000K | |
| 🥉 | Claude 4.6 Opus Thinking | Anthropic | 76.33 | $5/$25 | 1000K |
| #4 | Claude 4.5 Opus Thinking | Anthropic | 75.96 | $5/$25 | 200K |
| #5 | Claude 4.6 Sonnet Thinking | Anthropic | 75.47 | $3/$15 | 1000K |
| #6 | GPT-5.2 High | OpenAI | 74.84 | $5/$20 | 500K |
| #7 | GPT-5.2 Codex | OpenAI | 74.3 | $2.5/$10 | 500K |
| #8 | GPT-5.1 Codex Max | OpenAI | 73.98 | $15/$75 | 200K |
| #9 | Gemini 3 Pro Preview | 73.39 | $2/$12 | 200K | |
| #10 | Kimi K2.5 Thinking | Moonshot AI | 69.07 | $0.6/$3 | 200K |