- Security firm AIR successfully deployed a deceptive AI agent skill that bypassed all major security scanners and reached an estimated 26,000 agents.
- The skill exploited a critical vulnerability by hosting its malicious payload on an external website that could be swapped after the initial security review.
- Researchers from Trail of Bits and others have previously demonstrated that current scanning tools are ineffective against this dynamic attack method.
- Anthropic’s own documentation warns that skills fetching external URLs are inherently risky because the content can change post-vetting.
Security researchers from firm AIR recently exposed a critical flaw in the AI agent ecosystem by tricking major security scanners and infiltrating corporate accounts. The firm built a fake skill named brand-landingpage, marketed it on Instagram, and pushed it through a popular marketplace to demonstrate the vulnerability.
Every security scanner tested, including those from Cisco and NVIDIA, marked the skill as safe during initial inspection. Consequently, the skill was installed by roughly 26,000 agents after being merged into a repository with high GitHub stars. The payload was initially harmless, designed only to collect user email addresses from agents with corporate access.
However, the attack exploited a fundamental structural weakness in the security review process. The skill contained no malicious code itself but instructed the agent to fetch and run instructions from an external link that AIR controlled. According to their report, the firm swapped the page behind that link after widespread installation.
This method bypassed scanners because they only analyze the static package submitted for review. Separate research by Trail of Bits confirmed that attackers can keep tweaking an external payload until it passes a scan. Meanwhile, real malicious campaigns have reportedly used this same trick for months.
The problem is compounded because scanners often disagree, as other research this year found. Consequently, the ecosystem’s trust signals—like GitHub stars and a clean scan—are proving unreliable. Defenders are now urged to treat skills as executable software and vet all external links they reference.
Anthropic’s own platform documentation already warns about the risks of skills that fetch external URLs. Therefore, the security gap highlighted by this experiment remains a significant and unclosed vulnerability for organizations deploying AI agents.
✅ Follow BITNEWSBOT on Telegram, Facebook, LinkedIn, X.com, and Google News for instant updates.
