Echo Chamber Jailbreak Bypasses LLM Safeguards With 90% Success

Echo Chamber: New Jailbreak Method Outsmarts AI Safety in Top Language Models

  • Researchers identified a new jailbreaking strategy, Echo Chamber, that can bypass safety protections in popular large language models (LLMs).
  • Unlike traditional attacks, Echo Chamber uses indirect prompts and multi-step reasoning to manipulate AI responses.
  • Experiments showed Echo Chamber achieved over 90% success in prompting unsafe outputs about sensitive topics, including hate speech and self-harm.
  • Similar attacks, such as Crescendo and many-shot jailbreaks, exploit LLMs using subtle, contextual manipulation over multiple prompt rounds.
  • A separate proof-of-concept attack revealed risks in integrating AI with business software, as attackers can use indirect methods to trigger harmful outcomes.

Cybersecurity researchers reported on June 23, 2025, a new method called Echo Chamber that can bypass safety controls in widely used large language models (LLMs). This technique poses risks by enabling attackers to generate harmful or policy-violating content, even when safeguards are present.

- Advertisement -

The Echo Chamber method, developed by experts at NeuralTrust, does not use obvious triggers or typographical tricks. Instead, it relies on indirect suggestions, context manipulation, and step-by-step reasoning to gradually lead LLMs into producing undesirable outputs. In official tests with models from OpenAI and Google, Echo Chamber succeeded over 90% of the time in areas such as sexism, violence, hate speech, and pornography. The attack reached nearly 80% effectiveness in misinformation and self-harm categories.

“This creates a feedback loop where the model begins to amplify the harmful subtext embedded in the conversation, gradually eroding its own safety resistances,” reported NeuralTrust. Ahmad Alobaid, a technical lead at the company, explained, “Early planted prompts influence the model’s responses, which are then leveraged in later turns to reinforce the original objective.” Unlike Crescendo attacks, where the user deliberately guides the conversation, Echo Chamber relies on the LLM itself filling in the gaps through multi-stage prompting.

Researchers noted that other strategies, such as Crescendo and many-shot jailbreaks, take advantage of LLMs’ ability to process lengthy prompts. In these attacks, attackers use a series of seemingly innocent or context-rich messages to push the LLM toward unwanted behavior, often without revealing their intentions at the start.

The findings highlight ongoing challenges in developing LLMs that can reliably distinguish between acceptable and harmful content. While models are programmed to reject specific topics, multi-turn and indirect attacks like Echo Chamber demonstrate their vulnerability to sophisticated manipulation techniques.

Separately, researchers at Cato Networks presented a proof-of-concept attack against Atlassian’s model context protocol (MCP) server. In this approach, threat actors submitted malicious support tickets that prompted unintentional harmful actions when processed by an AI-integrated system, a tactic dubbed “Living off AI.” According to the team, “The support engineer acted as a proxy, unknowingly executing malicious instructions through Atlassian MCP.” This case underlines new risks when AI models interact with external business platforms.

✅ Follow BITNEWSBOT on Telegram, Facebook, LinkedIn, X.com, and Google News for instant updates.

- Advertisement -

Previous Articles:

Stay in the Loop

Get exclusive crypto insights, breaking news, and market analysis delivered straight to your inbox. No fluff, just facts.

    1 Email per day. Unsubscribe at any time.

    - Advertisement -

    Latest News

    Chainlink Surges 15%; Analysts Predict New All-Time High for LINK

    ChainLink (LINK) rose over 15% in the past week, driving renewed price forecasts. Analysts expect...

    Nyan Cat Creator Claims $700K in Royalties Amid Meme Coin Boom

    Chris Torres, the creator of Nyan Cat, has received nearly $706,000 in royalties from...

    Shiba Inu Burn Rate Soars 3,464%, Ignites Major Price Rally

    Shiba Inu's burn rate rose by over 3,460% in 24 hours, leading to 9.8...

    VivoPower to Buy $100M in Ripple Shares, Expands XRP Treasury

    VivoPower International plans to buy $100 million in privately held Ripple Labs shares, boosting...

    XRP $10,000 Price Target Gains Traction Among Institutional Analysts

    Institutional analysts have started discussing a potential $10,000 price target for XRP. The target is...

    Must Read

    Best Metaverse Tokens to Buy on Binance for 10X Gains

    Ever since Facebook renamed their company to Meta, as well as their plans to build a metaverse where we can travel into using Virtual...