Skip to main content

LLM rankings and prices

ranking

LMSYS

Modelgrade
GPT-4-Turbo-2024-04-091258
Claude 3 Opus1253
Gemini 1.5 Pro API-0409-Preview1249
Meta Llama 3 70b Instruct1213
Claude 3 Sonnet1201
Command R+1192
Claude 3 Haiku1181
Mistral-Large-24021158
Qwen1.5-72B-Chat1153
Command R1150
Mistral Medium1147
Meta Llama 3 8b Instruct1147
Mixtral-8x22b-Instruct-v0.11145
Qwen1.5-32B-Chat1134
GPT-3.5-Turbo-06131119
Qwen1.5-14B-Chat1119
Mixtral-8x7b-Instruct-v0.11114
Yi-34B-Chat1109
WizardLM-70B-v1.01108

picture

evaluation standard

baichuan 13B-Form

RAG score

https://mp.weixin.qq.com/s/EdoA5fcyzgTw3LarMMe00g

image-20240424170230823

LMSYS

🏆 LMSYS Chatbot Arena Leaderboard

echelongraderepresentative
the first echelonAbove 200 minuteGPT-4
Claude 3 medium mug and tankard
second echelonAbove 150 minuteMistral medium mug and tankard
Claude 3 small glasses
Thousand Questions 72B
the third echelon110 minute or moreGPT-3.5

GPT-4 has a 12% improvement compared to GPT-3.5

Claude 3 small cup has a 6% improvement compared to GPT-3.5

image-20240329152610807

image-20240329152736938

CLUE test review

CLUE test language understanding evaluation Baseline

https://mp.weixin.qq.com/s/cI92Fp2ic13_BKaRSgZw4g

picture

price

Claude

Mistral

Gemini: Currently only 1.0 Pro price