LIVE
Spain Warns No MiCA Extensions for Non-Compliant Crypto FirmsSenators Ask CFTC to Probe Polymarket's Fake BetsOpenAI's New GPT-5.6 Models Are Named Sol, Terra and LunaSEC and CFTC Seek Comment on BTC, ETH, XRP Futures FrameworkDraftKings Launches Prediction Markets Exchange DKeX: ReportCanada Crypto Week Returns July 20–26, Celebrating the Future of Web3, Digital Assets and AIIBIT Leads Reported $469 Million Bitcoin ETF OutflowsUBS Partners With Nethermind to Test Ethereum InfrastructureETHWomen Returns to Toronto, Bringing Together Women Building the Future of Web3 and AIUSDT Surpasses Ethereum by Market Cap, Becomes No. 2 CryptoSpain Warns No MiCA Extensions for Non-Compliant Crypto FirmsSenators Ask CFTC to Probe Polymarket's Fake BetsOpenAI's New GPT-5.6 Models Are Named Sol, Terra and LunaSEC and CFTC Seek Comment on BTC, ETH, XRP Futures FrameworkDraftKings Launches Prediction Markets Exchange DKeX: ReportCanada Crypto Week Returns July 20–26, Celebrating the Future of Web3, Digital Assets and AIIBIT Leads Reported $469 Million Bitcoin ETF OutflowsUBS Partners With Nethermind to Test Ethereum InfrastructureETHWomen Returns to Toronto, Bringing Together Women Building the Future of Web3 and AIUSDT Surpasses Ethereum by Market Cap, Becomes No. 2 Crypto
Homepage/Crypto News/Ethereum sees EVMbench launch as AI tested on EVM bugs
CRYPTO NEWS

Ethereum sees EVMbench launch as AI tested on EVM bugs

BY Noah Carter·3 MIN READ·FEBRUARY 19, 2026

Developed with Paradigm, EVMbench is a new benchmark designed to evaluate AI agents on Ethereum Virtual Machine (EVM) smart contract vulnerabilities across three tasks: detect, patch, and exploit, as reported by The Defiant. The effort targets a measurable, security-relevant domain where smart contracts safeguard large on-chain values, aiming to clarify what current agents can and cannot do.

KEY FINDINGS - EVIDENCE LEVEL: MULTI-SOURCE
3Key sections mapped in this report
0Internal references connected to related coverage
0External source domains cited in the article
3 minEstimated time to read the full report
Ethereum sees EVMbench launch as AI tested on EVM bugs

OpenAI and Paradigm’s EVMbench tests AI on EVM smart contract vulnerabilities

OpenAI’s launch coincides with a $10 million commitment to cybersecurity research, according to Crypto Briefing. The initiative situates AI agents within crypto audit workflows while emphasizing defensively oriented applications in an area where model capabilities are changing quickly.

Detect, patch, exploit: how EVMbench evaluates AI agents

EVMbench assesses whether an agent can identify a vulnerability, propose a safe fix, and demonstrate the ability to exploit the flaw in a controlled environment; exploit performance has improved faster than detection and patching, according to OpenAI. The organization reports that a model variant dubbed GPT-5.3-Codex achieved 72.2% in exploit mode versus 31.9% for GPT-5, while success rates for detect and patch remain below full coverage across the benchmarked set.

Paradigm underscored how these results preview a structural shift in audits. “It’s now clear to us that a growing portion of audits in the future will be done by agents,” said Alpin Yukseloglu, Partner of Investing & Research, at Paradigm. The firm also noted that when this work began, top models exploited fewer than 20% of critical fund-draining bugs, whereas leading agents now exceed 70% in exploit mode, an improvement with clear implications for both offense and defense.

Independent research points to the same dual-use dynamic: Anthropic’s SCONE-bench showed agents could autonomously generate exploit code simulating $4.6 million in losses, including on contracts deployed after model training cutoffs, as reported by Cointelegraph. These findings suggest defenders face a narrowing window between disclosure and exploit, reinforcing the need for measurable evaluations like EVMbench.

What EVMbench means for audits, defense, and governance

Agent-in-the-loop audits are likely to expand as exploit capabilities advance, but lagging detect-and-patch performance indicates that human-led review, threat modeling, and governance will remain central. Industry practitioners have cautioned against assuming AI can replace expert auditors outright, for example, OpenZeppelin has observed that models can handle many known challenges yet still struggle with novel or adversarial cases, as reported by BitcoinInsider.

From a policy and controls perspective, benchmarks like EVMbench may inform pre-deployment scanning, continuous monitoring of on-chain behavior, and structured vulnerability disclosure norms. This trajectory suggests that boards, protocol DAOs, and security leads will weigh agent-access governance and audit sign-off criteria more explicitly as agent capabilities evolve.

At the time of this writing, Coinbase Global (COIN) traded near $164.81 in after-hours, based on data from NasdaqGS. Such figures are not directional but provide context for market attention around AI–crypto security initiatives that could influence operational priorities, vendor selection, and oversight frameworks.

Disclaimer:

The content on The CCPress is provided for informational purposes only and should not be considered financial or investment advice. Cryptocurrency investments carry inherent risks. Please consult a qualified financial advisor before making any investment decisions.
SOURCE TRANSPARENCY
  • Byline - Reported by Noah Carter
  • Coverage Desk - Primary editorial category: Crypto News
  • Media Asset - Featured image served from the WordPress media library