Microsoft Unveils Scanner to Detect Backdoored AI Models

Microsoft scanner detects hidden backdoors in open AI models without prior knowledge or training.

  • Microsoft has created a new scanner designed to find hidden backdoors in open-weight Large Language Models (LLMs).
  • The tool relies on three distinct behavioral signals within a poisoned model, such as unusual “double triangle” attention patterns.
  • This AI security breakthrough operates without needing prior knowledge of the backdoor or additional model training.
  • The scanner is a practical move to improve trust in AI, as Microsoft simultaneously expands its Secure Development Lifecycle to address AI-specific threats.

On February 4, 2026, Microsoft‘s AI Security team announced it built a lightweight scanner to detect backdoors in open-weight AI models. This development aims to improve overall trust in artificial intelligence systems by identifying covertly poisoned “sleeper agent” models.
The scanner leverages three observable signals to reliably flag backdoors with a low false positive rate, researchers said. These signatures are grounded in how trigger inputs measurably affect a model’s internal behavior, providing a technically robust basis for detection.
One key signal is a distinctive “double triangle” attention pattern that causes the model to focus on a trigger. Another is that backdoored models tend to leak their own poisoning data via memorization rather than training data.
Consequently, the scanner first extracts memorized content and then analyzes it to isolate suspicious substrings. Finally, it formalizes the three signatures as loss functions, scoring and ranking potential triggers.
However, the scanner does not work on proprietary models as it requires access to the model files. It also works best on trigger-based backdoors that generate deterministic outputs and is not a cure-all solution.
Meanwhile, Microsoft said it’s expanding its Secure Development Lifecycle to address AI-specific security concerns. “Unlike traditional systems with predictable pathways, AI systems create multiple entry points for unsafe inputs,” corporate VP Yonatan Zunger said. These entry points can carry malicious content or trigger entirely unexpected behaviors.

- Advertisement -

✅ Follow BITNEWSBOT on Telegram, Facebook, LinkedIn, X.com, and Google News for instant updates.

Previous Articles:

- Advertisement -

Latest News

Payoneer Files for US Banking Charter to Issue Stablecoin

Payoneer has filed for a US national trust bank charter to potentially issue a...

Bitwise Acquires Staking Firm Chorus One

Bitwise has acquired staking infrastructure firm Chorus One, bringing over $2.2 billion in staked...

Kraken launches tokenized stock futures for non-US clients

Crypto exchange Kraken has launched perpetual futures contracts for 10 of its tokenized stocks...

Tesla Raises Cybertruck Base Price After Feb. 28

Tesla announced on its website that the price of its base Cybertruck Dual Motor...

Binance Demands WSJ Take Down Iran Sanctions Article

Binance is demanding the Wall Street Journal retract an article alleging it fired staff...

Must Read

7 Best Audiobooks on Cybersecurity

Cybersecurity has become an essential topic in our increasingly digital world. As technology evolves and becomes more integrated into our daily lives, the importance...
🔥 #AD Get 20% OFF any new 12 month hosting plan from Hostinger. Click here!