An group growing math benchmarks for AI didn’t disclose that it had obtained funding from OpenAI till comparatively not too long ago, drawing allegations of impropriety from some within the AI group.
Epoch AI, a nonprofit primarily funded by Open Philanthropy, a analysis and grantmaking basis, revealed on December 20 that OpenAI had supported the creation of FrontierMath. FrontierMath, a take a look at with expert-level issues designed to measure an AI’s mathematical abilities, was one of many benchmarks OpenAI used to demo its upcoming flagship AI, o3.
In a publish on the discussion board LessWrong, a contractor for Epoch AI going by the username “Meemi” says that many contributors to the FrontierMath benchmark weren’t knowledgeable of OpenAI’s involvement till it was made public.
“The communication about this has been non-transparent,” Meemi wrote. “In my view Epoch AI ought to have disclosed OpenAI funding, and contractors ought to have clear details about the potential of their work getting used for capabilities, when selecting whether or not to work on a benchmark.”
On social media, some customers raised issues that the secrecy may erode FrontierMath’s popularity as an goal benchmark. In addition to backing FrontierMath, OpenAI had entry to lots of the issues and options within the benchmark — a reality Epoch AI didn’t reveal previous to December 20, when o3 was introduced.
In a reply to Meemi’s publish, Tamay Besiroglu, affiliate director of Epoch AI and one of many group’s co-founders, asserted that the integrity of FrontierMath hadn’t been compromised, however admitted that Epoch AI “made a mistake” in not being extra clear.
“We have been restricted from disclosing the partnership till across the time o3 launched, and in hindsight we must always have negotiated tougher for the flexibility to be clear to the benchmark contributors as quickly as attainable,” Besiroglu wrote. “Our mathematicians deserved to know who might need entry to their work. Even although we have been contractually restricted in what let’s imagine, we must always have made transparency with our contributors a non-negotiable a part of our settlement with OpenAI.”
Besiroglu added that whereas OpenAI has entry to FrontierMath, it has a “verbal settlement” with Epoch AI to not use FrontierMath’s downside set to coach its AI. (Training an AI on FrontierMath can be akin to instructing to the take a look at.) Epoch AI additionally has a “separate holdout set” that serves as a further safeguard for impartial verification of FrontierMath benchmark outcomes, Besiroglu mentioned.
“OpenAI has … been totally supportive of our choice to keep up a separate, unseen holdout set,” Besiroglu wrote.
However, muddying the waters, Epoch AI lead mathematician Ellot Glazer famous in a publish on Reddit that Epoch AI hasn’t be capable to independently confirm OpenAI’s FrontierMath o3 outcomes.
“My private opinion is that [OpenAI’s] rating is legit (i.e., they didn’t prepare on the dataset), and that they haven’t any incentive to lie about inside benchmarking performances,” Glazer mentioned. “However, we will’t vouch for them till our impartial analysis is full.”
The saga is one more instance of the problem of growing empirical benchmarks to guage AI — and securing the mandatory assets for benchmark improvement with out creating the notion of conflicts of curiosity.