New Open-Source AI Model Matches China’s DeepSeek with 85% Less Training Data

OpenThinker-32B Outperforms DeepSeek in Math Reasoning with Just One-Seventh of Training Data

  • OpenThinker-32B achieves 90.6% accuracy on MATH500, surpassing DeepSeek’s 89.4% with significantly less training data.
  • The model demonstrates superior efficiency, requiring only 114,000 training examples compared to DeepSeek’s 800,000.
  • Open source release includes verified and unverified datasets, enabling broader community development.
  • Training completed in 90 hours using four nodes with eight H100 GPUs, showing practical implementation potential.
  • Built on Alibaba‘s Qwen2.5-32B-Instruct LLM, supporting a 16,000-token context window for complex operations.

A breakthrough in AI reasoning emerged Wednesday as international researchers unveiled OpenThinker-32B, a model that challenges DeepSeek‘s dominance in mathematical and problem-solving capabilities while using just one-seventh of the training data.

- Advertisement -

The model, developed by the Open Thoughts consortium, demonstrated remarkable efficiency by achieving superior results across multiple benchmarks. On the MATH500 assessment, OpenThinker-32B scored 90.6% accuracy, exceeding DeepSeek’s 89.4%. Similarly, it outperformed in general problem-solving with a GPQA-Diamond score of 61.6 versus DeepSeek’s 57.6.

The project’s efficiency stems from its innovative OpenThoughts-114k dataset, which includes comprehensive metadata, ground truth solutions, and domain-specific information. A separate unverified dataset containing 137,000 samples was processed using Italy‘s Leonardo Supercomputer, consuming 11,520 A100 hours in just 30 hours.

This development arrives amid intensifying competition in AI reasoning capabilities. OpenAI recently announced reasoning features for post-GPT-5 models, while xAI‘s Grok-3 and Nous Research‘s DeepHermes join the race.

The model’s accessibility through HuggingFace, including a smaller 7B parameter version, represents a significant shift toward open-source AI development. Unlike DeepSeek, which keeps its training data private, OpenThinker’s complete transparency enables easier reproduction and improvement by the developer community.

Backed by researchers from Stanford, Berkeley, UCLA, and the Juelich Supercomputing Center, along with the Toyota Research Institute, OpenThinker-32B demonstrates how international collaboration can produce competitive AI models without relying on massive proprietary datasets.

✅ Follow BITNEWSBOT on Telegram, Facebook, LinkedIn, X.com, and Google News for instant updates.

Previous Articles:

- Advertisement -

Latest

Man Faces Prison for Hiding $13M in CryptoPunk NFT Sales from IRS

Pennsylvania man Waylon Wilcox faces up to six years in prison after pleading guilty to concealing over $13 million in CryptoPunks NFT sales income.Wilcox...

Gold-Backed Cryptocurrencies Surge as Investors Seek Digital Safe Haven

Gold-backed cryptocurrencies like Paxos Gold (PAXG) and Tether Gold (XAUT) have surged over 24% year-to-date to all-time highs above $3,300.While tokenized gold has thrived...

Mantra (OM) token plummets 90% in 24 hours, wipes out $6B market cap

Mantra (OM) token has crashed over 90% in 24 hours, plummeting from $6.3 to under $0.50, wiping out most of its $6 billion market...

Crypto Gaming Tokens Plummet, Vanish from Top 100 as Market Struggles

Gaming tokens have disappeared from the top 100 cryptocurrency rankings by market cap despite having six representatives a year ago.Eve Frontier launched a 10-day...

Trump to impose new semiconductor tariffs on electronics within months

Commerce Secretary Howard Lutnick clarified that recent tariff exemptions for consumer electronics are only temporary.New semiconductor-focused tariffs are expected within "a month or two"...

Must Read

9 Best Books On Ethereum And Blockchain Technology (Beginners And Advanced Readers)

Ethereum is a complex topic, and it can be difficult to know where to start learning about it.Even for people who are familiar with...