Anthropic launches a midnight bloodbath in the $50 billion industry! The doomsday of code auditing has arrived

Anthropic has launched a new feature for Claude Code, adding code auditing capabilities that directly challenge the $50 billion code security auditing industry. After testing, the proportion of substantial review comments in PRs increased from 16% to 54%, with an error rate of less than 1%. The cost of this feature is only 1/2000 of traditional audits, which could lead to a significant drop in security stocks and marks the end of traditional code auditing

Just now, Anthropic has made another move!

The father of Claude Code has officially announced: Claude Code adds a new code review feature.

This time, it is targeting a $50 billion industry—code security auditing.

The new feature just released by Anthropic can be said to directly challenge the entire code security industry in an extremely straightforward manner.

Some have exclaimed: The $50 billion industry has been overturned by Anthropic overnight!

Now, we can wait for security stocks to plummet.

At Anthropic, almost every PR has tested this system.

After months of testing, the results are as follows:

The proportion of PRs containing substantive review comments increased from 16% to 54%.
The proportion of engineers who believe the review results are incorrect is less than 1%.
In large Pull Requests (over 1000 lines), 84% of PRs have superficial issues, with an average of 7.5 issues per PR.

Currently, this feature has been launched as a research preview for the Claude Team and Enterprise beta.

A Nightmare for the $50 Billion Market

Anthropic's product has caused a seismic shift in the global AI and cybersecurity (AppSec) circles that is bound to go down in history.

Senior developers are exclaiming that the $50 billion code auditing industry has been disrupted!

This is because, in the past, large companies had to pay traditional security vendors (such as Snyk, Checkmarx, etc.) up to $50,000 or even higher in licensing fees each year to hire professional teams for scanning and auditing to prevent bugs or security vulnerabilities from flowing into the production environment.

Now, Claude can directly deploy a team of AI agents to lurk in your PRs, on standby 24/7.

Moreover, based on token calculations, the cost of a single Review is only about $15-25 on average!

$50,000 and $25, a difference of 2000 times This is not a functional update at all; it sounds the death knell for traditional code auditing.

Code Review, the most painful part for developers

If you ask any engineering team: what is the biggest bottleneck in software development?

Many people's answer would likely be code review.

In the past few years, the ability of AI to write code has advanced rapidly. Whether it's GitHub Copilot, Cursor, Claude Code, or ChatGPT, developers using these tools have seen a dramatic increase in the amount of code they produce.

As a result, the problem arises—although code is being produced at lightning speed, the number of people reviewing the code has not increased.

Anthropic found that in the past year, each engineer's code output increased by 200%, but many Pull Requests (PRs) were only given a quick glance.

Even developers themselves admit that many code reviews are just a formality.

Thus, a large number of bugs, vulnerabilities, and logical issues are brought into the production environment.

This is why many companies are willing to spend exorbitant amounts on security scanning tools.

However, the problem is—these tools are not smart.

What are the issues with traditional code scanning tools?

If you have used traditional AppSec tools like Snyk, Checkmarx, Veracode, or SonarQube, you will likely have the feeling that there are too many false positives.

The reason is that most of these tools are based on static rules and known vulnerability databases; they can scan code but cannot truly understand it.

A common scenario is when a tool alerts that there is a "potential SQL injection risk," and after checking for a long time, the developer finds no issue.

As a result, people gradually start to ignore warnings, and the truly dangerous problems are often overlooked.

Therefore, companies still need a large amount of manual Code Review, and what Anthropic is doing this time is automating it.

Anthropic throws out an AI code review army

This time, the idea behind Claude Code Review is actually quite simple.

In Claude Code, the system can automatically analyze Pull Requests and check from multiple angles, such as:

Whether the code standards comply with project rules
Whether there are potential bugs
Whether the modifications conflict with the logic of historical code
Whether issues raised in previous PRs reappear

Ultimately, they will output two results: a high-signal summary comment and an inline comment on specific code locations In other words, when you open a PR, you will see an AI review report that highlights the truly important issues, rather than dozens of pages of mundane details.

The era of "AI writing code, AI reviewing" has finally arrived.

Claude's self-looping and self-recursion have begun to emerge.

As AI capabilities continue to grow, the only role for humans in the future may be to turn on the AI switch, with the keyboard only needing the Claude key.

Multi-Agent System, Claude Code Review Team Activated

The biggest feature of Claude Code Review is that it is not just one AI, but a team.

When a PR is created, the system automatically activates a team of AI agents.

According to reports, Claude's new code review feature will deploy multiple AI "review agents" to work in parallel, with each agent responsible for different types of checks.

These agents filter out false positives through validation and rank errors by severity. The final result will be presented as a high-signal summary comment, along with inline comments addressing specific errors, on the PR.

The scale of the review will adjust according to the size of the PR.

Large or complex changes will receive more agents and deeper reviews; minor changes will be quickly approved. According to tests by Anthropic, the average review time is about 20 minutes.

Ultimately, through mutual verification among multiple agents, false positives can be reduced.

In this process, it will focus on identifying logical errors, security vulnerabilities, edge case defects, and hidden regression issues.

All identified problems will be marked by severity.

The red dot indicates a normal issue, which is a bug that should be fixed before merging the code;
The yellow dot indicates a minor issue, suggesting a fix but will not prevent the merge;
The purple dot indicates an existing issue, not a bug introduced by this PR.

Each review comment also includes a collapsible extended reasoning.

When expanded, you can see:

Why Claude flagged the issue
How it verified that the issue indeed exists

It is important to note that these comments do not automatically approve or prevent PR merges, so they do not disrupt the existing code review process.

By default, Claude Code Review primarily focuses on code correctness.

In other words, it emphasizes checking:

Bugs that could lead to production failures
Actual logical issues

And does not focus heavily on code formatting, style preferences, or missing tests.

If you want to expand the scope of checks, user configuration is required.

Internal Testing Results Are Terrifying

Anthropic's internal testing results are terrifying! It further proves that traditional code reviews are basically a joke.

The internal data is indeed shocking: only 16% of PRs received substantial review comments.

In large PRs with over 1000 lines, 84% of the code had issues identified, with an average of 7.5 bugs caught per PR.

Why? The reason is that engineers are too busy.

In the past year, each engineer's code output at Anthropic has increased by 200%. With more and more code, who has the time to scrutinize line by line?

After implementing this feature, the proportion of PRs with substantial fix suggestions in the codebase skyrocketed from 16% to 54%.

This means that nearly 40% of potential messy code previously slipped past human programmers, but now, they have all been caught by Claude.

Even more terrifying is that for small PRs with fewer than 50 lines, people used to think, with just a few lines, what could possibly be wrong.

As a result, 31% of them had issues discovered, meaning one in three small changes hid a bug.

And the acceptance rate of the issues identified by engineers reached over 99%! Less than 1% of the results were marked as false positives by engineers.

This accuracy surpasses that of the vast majority of human reviewers.

Anthropic provided an internal example: a change in a single line of code for a production service, which seemed like a routine operation and typically would receive quick approval. However, the code review flagged it as a serious issue.

The change would cause authentication to fail, a failure mode that is easily overlooked in the diff comparison, but once pointed out, it becomes very obvious.

The issue was fixed before merging, and the engineers later stated that they might not have discovered the problem themselves.

Now, let me share another real case.

iXsystems, a company that makes TrueNAS, conducted a code review on a ZFS encryption-related code refactor.

This was a deep technical change, and the reviewers were experts in the field.

As a result, the code review did something that surprised everyone: it discovered a potential bug in the "adjacent code."

That bug was not within the core scope of this change; it was just that the code "happened to be involved in the change." This type mismatch issue could silently erase the encryption key cache every time synchronization occurred.

This was a long-hidden bug that had been there all along, but no one had discovered it.

Human experts would find it nearly impossible to detect because it was not in the diff and not a focus of attention, but one day, it could blow up your system.

However, now the code review has brought it to light.

The Industry Shakeup is Here

Now, security companies and SaaS vendors are lamenting.

How long can a code security company that charges $50,000 a year survive?

It's not that their technology is poor, but the business logic has changed.

If Anthropic can use an agent team to solve deep business logic security audits for just $20, who would still buy those traditional scanners that cost tens of thousands of dollars and have an absurdly high false positive rate?

If you are still manually reviewing thousands of lines of code or still paying high security audit fees, wake up, the times have changed.

Tonight, the stocks in the AppSec industry may really feel the chill of AI.

Risk Warning and Disclaimer

The market has risks, and investment requires caution. This article does not constitute personal investment advice and does not take into account individual users' specific investment objectives, financial situations, or needs. Users should consider whether any opinions, views, or conclusions in this article align with their specific circumstances. Investing based on this is at your own risk