- OpenAI, with Paradigm and OtterSec, launched a new benchmark to test AI agents on smart contract security vulnerabilities.
- Anthropic’s Claude Opus 4.6 outperformed competitors, finding the most potential value in exploitable flaws.
- The benchmark arrives as crypto thefts grow, and top executives foresee AI agents driving future crypto transactions and security.
- Crypto venture capitalist Haseeb Qureshi argues smart contracts need AI intermediaries, or “self-driving wallets,” to achieve mainstream use.
On Wednesday, OpenAI unveiled a major new benchmark evaluating AI models on their ability to find and exploit security weaknesses in crypto smart contracts. The research, conducted in collaboration with Paradigm and OtterSec and released in a new paper, tested AI agents against 120 curated vulnerabilities to measure their performance in an economically critical domain.
Anthropic’s Claude Opus 4.6 led the pack with an average “detect award” of $37,824. OpenAI’s own model and Google’s Gemini 3 Pro followed with $31,623 and $25,112, respectively, according to the published data. OpenAI stated it’s vital to test AI in meaningful environments, noting smart contracts secure billions and AI will transform both attack and defense.
Consequently, the need for such tools is underscored by the $3.4 billion in crypto stolen by attackers just last year. This benchmark aims to track AI progress in mitigating these costly vulnerabilities at scale, a growing priority for the ecosystem.
Meanwhile, industry leaders are predicting a future where AI agents dominate crypto transactions. Circle CEO Jeremy Allaire recently forecast billions of AI agents using stablecoins within five years, a sentiment echoed by former Binance chief Changpeng Zhao.
However, a core challenge remains user experience and security. Dragonfly’s Haseeb Qureshi argued on X that smart contracts were never designed for human intuition, making large transactions feel “terrifying” compared to traditional bank transfers. He proposed the solution is AI-intermediated, self-driving wallets that manage complex operations securely.
✅ Follow BITNEWSBOT on Telegram, Facebook, LinkedIn, X.com, and Google News for instant updates.
Previous Articles:
- OpenAI Agents Better at Hacking Than Fixing Code
- Downturn Trims Hype, Sharpens Builder Focus at ETH Denver
- Goldman Sachs CEO David Solomon Holds Bitcoin
- Bitcoin Holder Stress Hits 2018 Lows, Hinting at Bottom
- Apple’s Stock Lags Nasdaq as AI Delay Sparks Downgrades
