OpenAI Agents Better at Hacking Than Fixing Code

OpenAI launches EVMbench to test AI agents on smart contract security tasks.

  • OpenAI and Paradigm released EVMbench, a new tool to test AI agents on smart contract security tasks.
  • Research shows AI agents are significantly better at exploiting smart contract flaws than finding or fixing them, with GPT-5.3-Codex excelling.
  • The tool’s release follows a recent incident where an AI-generated bug cost Moonwell users nearly $2.7 million.

OpenAI and crypto venture firm Paradigm launched a new benchmarking tool on Wednesday that rigorously evaluates how AI agents handle smart contract security vulnerabilities. This release arrives just days after a costly bug in AI-generated code led to significant user losses.

- Advertisement -

The tool, called EVMbench, is built from 120 vulnerabilities identified in over 40 prior audits. Consequently, it provides a standardized way to measure AI performance on detection, patching, and exploitation tasks.

Results from the tool reveal a stark capability gap among current AI models. OpenAI’s latest model, GPT-5.3-Codex, more than doubled its predecessor’s effectiveness at exploiting flaws to drain funds.

However, its success in finding and fixing vulnerabilities “remain below full coverage,” according to the company’s news release. The agents sometimes stop after finding one issue or struggle to maintain functionality while patching.

In benchmark comparisons, Anthropic’s Claude Opus 4.6 scored highest for detecting vulnerabilities. Meanwhile, GPT-5.3-Codex achieved top results in both patching and exploiting smart contracts.

- Advertisement -

OpenAI cautioned that EVMbench has limitations due to its finite sample of vulnerabilities. The tool also cannot reliably determine if agent-found vulnerabilities are false positives.

Testing such tools is critical as smart contract hacks continue to plague the industry. According to data, protocols have suffered over $108 million in exploits so far in 2026.

✅ Follow BITNEWSBOT on Telegram, Facebook, LinkedIn, X.com, and Google News for instant updates.

Previous Articles:

- Advertisement -

Latest News

Tesla: FSD Cars Drive 5.3M Miles Per Major Crash

EV giant Tesla Inc. announced its Full Self-Driving (FSD) technology has surpassed 8 billion...

OpenAI AI Benchmark Tests Crypto Contract Exploit Skills

OpenAI, with Paradigm and OtterSec, launched a new benchmark to test AI agents on...

Downturn Trims Hype, Sharpens Builder Focus at ETH Denver

ETH Denver founder John Paller asserts that the current market downturn has filtered out...

Goldman Sachs CEO David Solomon Holds Bitcoin

Goldman Sachs CEO David Solomon has revealed he personally owns a "very, very limited"...

Bitcoin Holder Stress Hits 2018 Lows, Hinting at Bottom

Bitcoin’s “short-term holder stress” metric has reached its lowest level since the 2018 bear...

Must Read

17 Best Cryptocurrency Wallets

If you are looking for a list with the best cryptocurrency wallets, then you've landed on the right page. Cryptocurrency, as we all know,...
🔥 #AD Get 20% OFF any new 12 month hosting plan from Hostinger. Click here!