Skip to content


LMSYS Chatbot Arena Leaderboard

LMSYS Chatbot Arena Leaderboard (March 26, 2024)

https://huggingface.co/spaces/lmsys/chatbot-arena-leaderboard

Contribute your vote 🗳️ at chat.lmsys.org! Find more analysis in the notebook.

Rank

🤖 Model

⭐ Arena Elo

📊 95% CI

🗳️ Votes

Organization

License

Knowledge Cutoff

10
1253
+14/-11
33250
Cognitive Computations
Falcon-180B TII License
2023/12

Rank

🤖 Model

⭐ Arena Elo

📊 95% CI

🗳️ Votes

Organization

License

Knowledge Cutoff

1
1253
+5/-5
33250
Anthropic
Proprietary
2023/8
1
1251
+4/-4
54141
OpenAI
Proprietary
2023/4
1
1248
+4/-4
34825
OpenAI
Proprietary
2023/12
4
1203
+5/-7
12476
Google
Proprietary
Online
4
1198
+5/-5
32761
Anthropic
Proprietary
2023/8
6
1185
+5/-4
33499
OpenAI
Proprietary
2021/9
6
1179
+5/-5
18776
Anthropic
Proprietary
2023/8
8
1158
+4/-5
51860
OpenAI
Proprietary
2021/9
8
1157
+5/-4
26734
Mistral
Proprietary
Unknown
9
1148
+5/-5
20211
Alibaba
Qianwen LICENSE
2024/2
10
1146
+6/-6
21908
Anthropic
Proprietary
Unknown
10
1145
+5/-4
26196
Mistral
Proprietary
Unknown
13
1127
+9/-10
4270
UC Berkeley
Apache-2.0
2024/3
13
1126
+7/-4
13543
Anthropic
Proprietary
Unknown
13
1125
+6/-6
14856
Google
Proprietary
2023/4
13
1122
+7/-5
13132
Mistral
Proprietary
Unknown

Note: we take the 95% confidence interval into account when determining a model’s ranking. A model is ranked higher only if its lower bound of model score is higher than the upper bound of the other model’s score. See Figure 3 below for visualization of the confidence intervals.”

Via Superhuman AI

“Anthropic’s Claude Opus dethrones OpenAI’s GPT-4

Claude is America’s next top model. GPT-4’s long reign as the undisputed king of AI models is coming to an end, as the latest results from one of the biggest benchmarks in AI have placed Anthropic’s Claude 3 Opus at the top of its ranking.
TLDR: Opus is the largest model from Anthropic’s newest family of Claude 3 models. It now ranks at the top of the LMSYS Chatbot Arena Leaderboard, a crowdsourced open platform for evaluating AI models.”
But that’s not the biggest surprise. Haiku, the smallest of the Claude 3 models, has beaten an earlier version of GPT-4. Haiku’s smaller size is impressive in itself but the achievement is absolutely seismic when you consider that Haiku is orders of magnitude cheaper than GPT-4.
Source: LMSYS Chatbot Arena Leaderboard
Haiku’s price and performance combo is an enticing proposition for users and builders. “This is excellent news for the market! We now have a GPT-4 class model that is 10x cheaper than GPT-4,“ claimed Abacus AI CEO Bindu Reddy. “That’s insane for how cheap & fast it is,“ added app builder Nick Dobos.
The ball is now in OpenAI’s court. “I don’t see how OpenAI survives on gpt-3.5 and gpt-4. Literally gpt-3.5 is utterly useless in the presence of Claude haiku,“ declared software engineer Anton (@abacaj on X). OpenAI might have a thing or two to say about that when it launches the widely-anticipated GPT-5.”

 

  • Pro plugin deactivated or invalid

Posted on: March 27, 2024, 10:40 am Category: Uncategorized

0 Responses

Stay in touch with the conversation, subscribe to the RSS feed for comments on this post.