Benchmarks used to rank AI models are several years old, often sourced from amateur websites, and, experts worry, lending automated systems a dubious sense of authority
There’s a reason why the open llm leaderboard was changed a while ago.
Basically, scores didn’t improve much anymore and many tests were contained in the training data.
There’s a reason why the open llm leaderboard was changed a while ago.
Basically, scores didn’t improve much anymore and many tests were contained in the training data.
See this blogpost for more info.
https://huggingface.co/spaces/open-llm-leaderboard/blog