- Microsoft has created a new scanner designed to find hidden backdoors in open-weight Large Language Models (LLMs).
- The tool relies on three distinct behavioral signals within a poisoned model, such as unusual “double triangle” attention patterns.
- This AI security breakthrough operates without needing prior knowledge of the backdoor or additional model training.
- The scanner is a practical move to improve trust in AI, as Microsoft simultaneously expands its Secure Development Lifecycle to address AI-specific threats.
On February 4, 2026, Microsoft‘s AI Security team announced it built a lightweight scanner to detect backdoors in open-weight AI models. This development aims to improve overall trust in artificial intelligence systems by identifying covertly poisoned “sleeper agent” models.
The scanner leverages three observable signals to reliably flag backdoors with a low false positive rate, researchers said. These signatures are grounded in how trigger inputs measurably affect a model’s internal behavior, providing a technically robust basis for detection.
One key signal is a distinctive “double triangle” attention pattern that causes the model to focus on a trigger. Another is that backdoored models tend to leak their own poisoning data via memorization rather than training data.
Consequently, the scanner first extracts memorized content and then analyzes it to isolate suspicious substrings. Finally, it formalizes the three signatures as loss functions, scoring and ranking potential triggers.
However, the scanner does not work on proprietary models as it requires access to the model files. It also works best on trigger-based backdoors that generate deterministic outputs and is not a cure-all solution.
Meanwhile, Microsoft said it’s expanding its Secure Development Lifecycle to address AI-specific security concerns. “Unlike traditional systems with predictable pathways, AI systems create multiple entry points for unsafe inputs,” corporate VP Yonatan Zunger said. These entry points can carry malicious content or trigger entirely unexpected behaviors.
✅ Follow BITNEWSBOT on Telegram, Facebook, LinkedIn, X.com, and Google News for instant updates.
Previous Articles:
- Polkadot’s Smart Contract Upgrade Sees Slow Adoption
- Decade-Old DAO Contract Saved in $100K Whitehat Rescue
- Bitcoin Dips to 15-Month Low, $70K Support Tested
- MSFT, PYPL Face Pressure Despite S&P 500 Rally
- Amazon in OpenAI Talks for Custom AI, Investment
