Anthropic's next-generation artificial intelligence model, Claude 3 Opus, took the top spot on the Chatbot Arena leaderboard, pushing OpenAI's GPT-4 into second place for the first time since the start of last year [LMSYS Chatbot Arena, unlike other AI model benchmarks, relies on human votes
The various GPT-4 versions of OpenAI have held the top spot for so long that other models close to their benchmark scores are known as GPT-4 class models Future rankings may require the introduction of a new Claude-3 class model
It is worth noting that the Claude-3 Opus and GPT-4 scores are very close, and a "significantly different" GPT-5 is expected at some point this year, a year after the OpenAI model was introduced
The chatbot arena is run by LMSys, the Large Model Systems Organization, where a wide variety of large language models fight in anonymous randomized battles
It was first launched last May, during which time models from Anthropic, OpenAI, and Google dominated most of the top 10, garnering over 400,000 user votes
More recently, open-source models have become increasingly present, with models from Chinese companies such as French AI startup Mistral and Alibaba also topping the list
The Elo rating system, widely used in games such as chess, is used to calculate a player's relative skill level Unlike chess, this ranking applies to chatbots and not to the humans using the model
The arena has limitations, not all models or versions of models are included, and GPT-4 models may not be loaded
The arena also does not include well-known models such as Google's Gemini Pro 15 and Gemini Ultra, which have huge context windows
More than 70,000 new votes made up the latest update, with the Claude 3 Opus topping the leaderboard, but even the smallest of the Claude 3 models fared well
LMSYS explains: "The Claude-3 Haiku impressed all, reaching the GPT-4 level according to user preference! Its speed, capability, and length of context are currently unmatched in the market"
What makes this even more impressive is that the Claude 3 Hike is a "local size" model comparable to Google's Gemini Nano It achieves great results without the huge parameter scales of over a trillion like the Opus and GPT-4 class models
While not as intelligent as Opus or Sonnet, Anthropic's Haiku is considerably cheaper, much faster, and, as the arena results suggest, comparable to much larger models in blind tests
All three Claude 3 models are in the top 10, with Opus in the top spot, Sonnet in joint 4th with Gemini Pro, and Haiku in joint 6th with an early version of GPT-4
Of the top 20 large language models on the Arena leaderboard, all but three are proprietary, suggesting that open source has some challenges in reaching the big players
Meta, which is focusing on open source AI, will release Llama 3 in the coming months, which is expected to have similar capabilities to Claude 3 and will likely make the top 10
There have also been other developments in open source and distributed AI, such as Emad Mostaque, founder of StabilityAI, stepping down from his CEO position to focus on more distributed and accessible artificial intelligence He stated that more centralized AI cannot beat centralized AI
Comments