- Researchers identified a new jailbreaking strategy, Echo Chamber, that can bypass safety protections in popular large language models (LLMs).
- Unlike traditional attacks, Echo Chamber uses indirect prompts and multi-step reasoning to manipulate AI responses.
- Experiments showed Echo Chamber achieved over 90% success in prompting unsafe outputs about sensitive topics, including hate speech and self-harm.
- Similar attacks, such as Crescendo and many-shot jailbreaks, exploit LLMs using subtle, contextual manipulation over multiple prompt rounds.
- A separate proof-of-concept attack revealed risks in integrating AI with business software, as attackers can use indirect methods to trigger harmful outcomes.
Cybersecurity researchers reported on June 23, 2025, a new method called Echo Chamber that can bypass safety controls in widely used large language models (LLMs). This technique poses risks by enabling attackers to generate harmful or policy-violating content, even when safeguards are present.
The Echo Chamber method, developed by experts at NeuralTrust, does not use obvious triggers or typographical tricks. Instead, it relies on indirect suggestions, context manipulation, and step-by-step reasoning to gradually lead LLMs into producing undesirable outputs. In official tests with models from OpenAI and Google, Echo Chamber succeeded over 90% of the time in areas such as sexism, violence, hate speech, and pornography. The attack reached nearly 80% effectiveness in misinformation and self-harm categories.
“This creates a feedback loop where the model begins to amplify the harmful subtext embedded in the conversation, gradually eroding its own safety resistances,” reported NeuralTrust. Ahmad Alobaid, a technical lead at the company, explained, “Early planted prompts influence the model’s responses, which are then leveraged in later turns to reinforce the original objective.” Unlike Crescendo attacks, where the user deliberately guides the conversation, Echo Chamber relies on the LLM itself filling in the gaps through multi-stage prompting.
Researchers noted that other strategies, such as Crescendo and many-shot jailbreaks, take advantage of LLMs’ ability to process lengthy prompts. In these attacks, attackers use a series of seemingly innocent or context-rich messages to push the LLM toward unwanted behavior, often without revealing their intentions at the start.
The findings highlight ongoing challenges in developing LLMs that can reliably distinguish between acceptable and harmful content. While models are programmed to reject specific topics, multi-turn and indirect attacks like Echo Chamber demonstrate their vulnerability to sophisticated manipulation techniques.
Separately, researchers at Cato Networks presented a proof-of-concept attack against Atlassian’s model context protocol (MCP) server. In this approach, threat actors submitted malicious support tickets that prompted unintentional harmful actions when processed by an AI-integrated system, a tactic dubbed “Living off AI.” According to the team, “The support engineer acted as a proxy, unknowingly executing malicious instructions through Atlassian MCP.” This case underlines new risks when AI models interact with external business platforms.
✅ Follow BITNEWSBOT on Telegram, Facebook, LinkedIn, X.com, and Google News for instant updates.
Previous Articles:
- Build & Build to Raise $100M, First Public Firm Holding BNB
- XRP Ledger Active Addresses Drop 80% Since December: CryptoQuant
- Midnight Network Unveils NIGHT Token Airdrop Across 8 Blockchains
- Fiserv to Launch FIUSD Stablecoin, Partners With PayPal, Paxos
- Cointelegraph, CoinMarketCap Hit by Malicious Wallet Pop-Up Attacks