The page is machine translated
TABLE OF CONTENT
QR Code
Scan this QR code to get the wallet
Select your store to download the app

AI models reveal $550m worth of vulnerabilities hidden in smart contracts

AI models reveal $550m worth of vulnerabilities hidden in smart contracts

Anthropic has shown that advanced AI systems can uncover hundreds of millions of dollars in smart contract vulnerabilities, identifying potential exploits worth $550.1 million across real-world blockchain protocols.

Researchers from MATS and the Anthropic Fellows program created a new evaluation benchmark called the Smart CONtracts Exploitation benchmark (SCONE-bench). This dataset includes 405 smart contracts that were successfully exploited between 2020 and 2025.

Using SCONE-bench, the team tested 10 different AI models. Together, the models produced ready-to-use exploits for 207 protocols—or 51.11% of the contracts tested—simulating the theft of $550.1 million in digital assets.

AI recreates post-2025 exploits despite knowledge cutoff

One of the more striking findings: even for hacks that occurred after March 2025, the final training cutoff for the tested models, AI systems were still able to reproduce exploits totaling $4.6 million. According to the researchers, this demonstrates a lower bound on the financial impact that capable AI systems could have if misused.

The team then moved on to live simulations of 2,849 newly deployed protocols that had no publicly known vulnerabilities. In this test, Sonnet 4.5 and GPT-5 uncovered two new zero-day flaws and produced functioning exploits worth $3,694.
OpenAI’s model accumulated $3,476 in API costs during the experiment.

Anthropic emphasized that all tests were conducted in controlled blockchain simulators with no real-world harm.

Why financial impact matters

Anthropic noted that existing cyber-evaluation tools like CyberGym and Cybench focus on the technical feasibility of advanced cyberattacks, often at the nation-state level. But they rarely quantify the financial consequences, which in practice can be among the most important metrics for policymakers and developers.

“Compared to arbitrary success metrics, quantifying capabilities in monetary terms is more useful for informing policymakers, developers, and the public about risks,” the researchers wrote.

Smart contracts were chosen because they operate entirely through public code and automated logic—handling trades, loans, and transfers without human oversight. This makes them ideal for measuring the real financial impact of software vulnerabilities.

What SCONE-bench includes

SCONE-bench is the first benchmark designed to measure an AI agent’s ability to exploit live financial logic in code—not just identify weaknesses. Each evaluation requires the model to detect a vulnerability, design an attack, and write the exploit script.

The benchmark includes:

  • 405 real exploited smart contracts across Ethereum, BNB Smart Chain, and Base
  • A baseline agent that attempts exploitation within a 60-minute window
  • Use of tools accessible via the Model Context Protocol (MCP)
  • A scoring and evaluation system
  • A feature allowing developers to test their own contracts before launch

Anthropic previously detected AI-powered cyber espionage

The research follows a September incident in which Anthropic’s threat analysis team uncovered and stopped what they described as the first AI-driven cyber-espionage campaign of its kind.

You may be interested in this

IronWallet - Crypto Wallet
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.