- AI agents built with models like GPT-5 and Gemini remain highly vulnerable to prompt injection attacks, with direct attacks succeeding over 79% of the time.
- Hidden “indirect” attacks embedded in web content can also manipulate agent behavior, achieving success rates between 41.67% and 68.16%.
- The vulnerability enables “stealthy parasitism,” where an agent completes a user’s task while simultaneously advancing a hidden attacker’s objective.
- Researchers warn prompt injection is a victim-dependent risk, where a single exploit can have asymmetric consequences for different stakeholders.
A new study published Thursday reveals AI agents powered by the latest models cannot consistently resist prompt injection attacks. Researchers from Nanyang Technological University, ST Engineering, IBM Research, and the University of Illinois Urbana-Champaign conducted this critical security assessment.
Consequently, they developed a new benchmark called StakeBench to test agents in realistic online environments. This framework probes factors like the semantic distance between a user’s intent and an injected malicious command.
The team executed 3,168 attack simulations using agents like NanoBrowser and BrowserUse with models including GPT-5 and Gemini 2.5-Flash. Direct prompt injection attacks succeeded more than 79% of the time across all tested configurations.
Meanwhile, indirect attacks, where instructions are hidden within web content, also proved highly effective. These covert methods achieved success rates ranging from 41.67% to 68.16% in the experiments.
The findings underscore a persistent threat as autonomous AI agents for tasks like crypto trading become mainstream. “Prompt-injection risk is victim-dependent: a single exploit can produce asymmetric consequences for different stakeholders,” the researchers wrote.
This vulnerability has manifested in real-world incidents documented by major tech firms. For example, Microsoft and Google have recently warned about attacks manipulating agents to leak credentials or send unauthorized payments.
The study also identified a subtle threat called “stealthy parasitism.” Here, an AI agent completes the user’s assigned task while clandestinely advancing an attacker’s hidden objective, such as subtly skewing product recommendations.
These results indicate that security is not just a property of the AI model itself. The distribution of harm is jointly determined by the stakeholder, the task’s semantic alignment, and the architectural deployment context.
✅ Follow BITNEWSBOT on Telegram, Facebook, LinkedIn, X.com, and Google News for instant updates.
