Echo Chamber Jailbreak Bypasses LLM Safeguards With 90% Success

Echo Chamber: New Jailbreak Method Outsmarts AI Safety in Top Language Models

  • Researchers identified a new jailbreaking strategy, Echo Chamber, that can bypass safety protections in popular large language models (LLMs).
  • Unlike traditional attacks, Echo Chamber uses indirect prompts and multi-step reasoning to manipulate AI responses.
  • Experiments showed Echo Chamber achieved over 90% success in prompting unsafe outputs about sensitive topics, including hate speech and self-harm.
  • Similar attacks, such as Crescendo and many-shot jailbreaks, exploit LLMs using subtle, contextual manipulation over multiple prompt rounds.
  • A separate proof-of-concept attack revealed risks in integrating AI with business software, as attackers can use indirect methods to trigger harmful outcomes.

Cybersecurity researchers reported on June 23, 2025, a new method called Echo Chamber that can bypass safety controls in widely used large language models (LLMs). This technique poses risks by enabling attackers to generate harmful or policy-violating content, even when safeguards are present.

- Advertisement -

The Echo Chamber method, developed by experts at NeuralTrust, does not use obvious triggers or typographical tricks. Instead, it relies on indirect suggestions, context manipulation, and step-by-step reasoning to gradually lead LLMs into producing undesirable outputs. In official tests with models from OpenAI and Google, Echo Chamber succeeded over 90% of the time in areas such as sexism, violence, hate speech, and pornography. The attack reached nearly 80% effectiveness in misinformation and self-harm categories.

“This creates a feedback loop where the model begins to amplify the harmful subtext embedded in the conversation, gradually eroding its own safety resistances,” reported NeuralTrust. Ahmad Alobaid, a technical lead at the company, explained, “Early planted prompts influence the model’s responses, which are then leveraged in later turns to reinforce the original objective.” Unlike Crescendo attacks, where the user deliberately guides the conversation, Echo Chamber relies on the LLM itself filling in the gaps through multi-stage prompting.

Researchers noted that other strategies, such as Crescendo and many-shot jailbreaks, take advantage of LLMs’ ability to process lengthy prompts. In these attacks, attackers use a series of seemingly innocent or context-rich messages to push the LLM toward unwanted behavior, often without revealing their intentions at the start.

The findings highlight ongoing challenges in developing LLMs that can reliably distinguish between acceptable and harmful content. While models are programmed to reject specific topics, multi-turn and indirect attacks like Echo Chamber demonstrate their vulnerability to sophisticated manipulation techniques.

- Advertisement -

Separately, researchers at Cato Networks presented a proof-of-concept attack against Atlassian’s model context protocol (MCP) server. In this approach, threat actors submitted malicious support tickets that prompted unintentional harmful actions when processed by an AI-integrated system, a tactic dubbed “Living off AI.” According to the team, “The support engineer acted as a proxy, unknowingly executing malicious instructions through Atlassian MCP.” This case underlines new risks when AI models interact with external business platforms.

✅ Follow BITNEWSBOT on Telegram, Facebook, LinkedIn, X.com, and Google News for instant updates.

Previous Articles:

- Advertisement -

Latest News

Dogecoin Rallies On Musk’s Moon Plan As Market Corrects

Dogecoin (DOGE) rallied 1.7% on Tuesday, February 4, 2026, following comments by Elon Musk...

BitMine’s Losses Are Plan, Not Flaw, Says Chairman Lee

BitMine Immersion defends its paper losses as an inevitable feature of its long-term Ethereum...

Bitcoin Ransom Note in Guthrie Kidnapping Probe

Investigators are treating the disappearance of Nancy Guthrie, mother of TV anchor Savannah Guthrie,...

Intel Stock Rises on Plans to Rival Nvidia With GPUs

Intel CEO Lip-Bu Tan announced the company will build rival data center GPUs, hiring...

Ether Plunges 28%, Faces $447M ETF Outflows

Ether's price plunged 28% in a week to $2,110, underperforming the broader crypto market.Spot...
- Advertisement -

Must Read

The Best Bitcoin Casinos of 2025: An Expert’s Data-Driven Guide

Key TakeawaysA Deep Dive into the Top Bitcoin Casinos of 2025Bitcoin Casino Comparison Table1. Stake.com: Best for Variety & Integrated Sports Betting2. BC.Game: Best...
🔥 #AD Get 20% OFF any new 12 month hosting plan from Hostinger. Click here!