BTC $71,807
2026 Bull Run Is Building Start trading with 5% OFF all fees
Sign Up Now
BTC $71,807
Bull Run 2026 | 5% Off Fees Open your Binance account today
Sign Up

OpenAI Under Scrutiny After Helping Create Math Test Its AI Later Excelled At

OpenAI faced criticism after its o3 model scored 25.2% on FrontierMath, a benchmark test the company helped develop.

  • Epoch AI disclosed that OpenAI commissioned 300 math problems and had access to their solutions.
  • Multiple AI models, including those from Google, Microsoft, and Meta, were found to have memorized benchmark test answers.
  • Epoch AI plans to implement a 50-problem holdout set to ensure genuine testing of AI capabilities.
  • The controversy highlights broader issues in AI performance evaluation methods across the industry.

Questions about Artificial Intelligence testing integrity emerged after OpenAI‘s involvement in developing a mathematical benchmark it later used to demonstrate its model’s capabilities, raising concerns about the validity of AI performance metrics across the industry.

- Advertisement -

Testing Transparency Issues

OpenAI‘s o3 model achieved a 25.2% score on FrontierMath, a mathematical assessment tool created by Epoch AI. However, subsequent revelations showed that OpenAI had funded the benchmark’s development and maintained access to problems and solutions.

According to Epoch AI‘s disclosure, the company provided 300 mathematics problems with solutions to OpenAI through a commissioned agreement.

Tamay Besiroglu, associate director at Epoch AI, revealed that OpenAI initially restricted disclosure of their partnership, stating: “We were restricted from disclosing the partnership until around the time o3 launched.”

The agreement included only a verbal commitment not to use the materials for model training.

- Advertisement -

Industry-Wide Pattern

AI researcher Louis Hunt’s investigation exposed that leading models from Google, Microsoft, Meta, and Alibaba could reproduce exact answers from MMLU and GSM8K benchmarks, standard tests measuring AI multitasking and mathematical abilities.

RemBrain founder Vasily Morzhakov emphasized the severity of the situation:

The models are tested in their instruction versions on MMLU and GSM8K tests. But the fact that base models can regenerate tests—it means those tests are already in pre-training.”

Moving Toward Solution

To address these concerns, Epoch AI announced plans to implement a “hold out set” comprising 50 randomly selected problems that will remain inaccessible to OpenAI.

Computer scientist Dirk Roeckmann suggests that proper testing requires a neutral evaluation environment, though acknowledging potential risks from human interference in the testing process.

The controversy parallels historical challenges in standardized testing, where access to test materials has consistently raised questions about assessment validity. This situation highlights the need for independent verification methods in artificial intelligence evaluation.

✅ Follow BITNEWSBOT on Facebook, LinkedIn, X.com, and Google News for instant updates.

Consider a small donation to support our journalism

Previous Articles:

- Advertisement -
Ad
Altseason Is Loading. Don't watch from the sidelines.
SOL $90.51
DOGE $0.0963
LINK $9.02
SUI $1.00
5% off fees when you sign up
Start Trading
Ad
Pay Less on Every Trade. For Life.
$10K/mo volume Save $60/yr
$50K/mo volume Save $300/yr
$100K/mo volume Save $600/yr
5% off all trading fees when you sign up
Claim Your Discount

Latest News

Tether’s Jesse Spiro to Chair $100M Crypto Super PAC

Tether's Head of Government Affairs, Jesse Spiro, will chair the crypto-funded Fellowship PAC ahead...

CERT-UA Impersonated, New RAT Attack Hits Ukraine

The Computer Emergency Response Team of Ukraine (CERT-UA) was impersonated in a phishing campaign...

Binance Launches Oil and Gas Futures with 100x Leverage

Binance has officially launched trading for oil and natural gas futures contracts, completing its...

Franklin Templeton Buys 250 Digital to Launch Crypto Unit

Franklin Templeton is establishing a dedicated crypto unit, Franklin Crypto, through the acquisition of...

Fed’s Barr: Stablecoin Rules Need Risk Safeguards

Federal Reserve Governor Michael Barr stated that the new GENIUS Act provides needed legal...

Must Read

Top 5 Best Crypto Faucets To Earn Free Crypto This Year

QUICK LINKSWhat Are Crypto Faucets and How Do They Work?How Do Crypto Faucets Make Money?What to Expect: Realistic EarningsThe Best Crypto Faucets of 2025:...
Ad
Altseason Is Loading. These 4 coins are trending right now.
SOL $92.12
DOGE $0.0950
LINK $9.02
SUI $1.02
5% off spot fees when you sign up
Start Trading