The nonprofit Center for AI Safety (CAIS) and Scale AI, an organization that gives numerous knowledge labeling and AI improvement companies, have launched a difficult new benchmark for frontier AI techniques.
The benchmark, referred to as Humanity’s Last Exam, consists of 1000’s of crowdsourced questions concerning topics like arithmetic, humanities, and the pure sciences. To make the analysis more durable, the questions are in a number of codecs, together with codecs that incorporate diagrams and pictures.
In a preliminary examine, not a single publicly obtainable flagship AI system managed to attain higher than 10% on Humanity’s Last Exam.
CAIS and Scale AI say they plan open up the benchmark to the analysis neighborhood in order that researchers can “dig deeper into the variations” and consider new AI fashions.