The Brickstorm Problem
By:
Ken Kato
Article
November 2, 2025
9 mins

The Brickstorm Problem: When Nation-State Hackers Get a 393-Day Head Start

I just watched an AI agent do something I didn't think was technically possible yet. Not in a "wow, neat demo" way. In a "this solves a real problem that's actively costing security teams right now" way.

The problem? Brickstorm. If you haven't heard of it yet, you will.

The Threat: Why Brickstorm Keeps Security Teams Up at Night

In March 2025, Google's Threat Intelligence Group and Mandiant identified a sophisticated espionage campaign targeting U.S. tech and legal sectors. The malware, dubbed Brickstorm, is attributed to UNC5221, a suspected China-nexus espionage group. Average dwell time: 393 days. Each victim gets a unique command-and-control infrastructure designed to evade detection. In August, F5 discovered the same attackers had stolen BIG-IP source code during a 12-month breach. Now every organization running F5 appliances is waiting for the other shoe to drop.

Think about those numbers. By the time most organizations discover Brickstorm, their log retention windows have already closed. The artifacts of initial intrusion? Gone. The forensic trail? Cold.

But the dwell time isn't even the worst part.

Brickstorm uses network appliances for initial access. Firewalls, VPNs, routers, IDS/IPS systems: network infrastructure that traditional endpoint detection and response (EDR) tools simply cannot reach. Shipping logs and network performance metrics to EDRs and the like isn’t enough. EDR agents aren't designed to analyze network daemons; a malicious network daemon is fundamentally different from catching endpoint malware. 

That's exactly where Brickstorm hides, and that's where it gets complicated.

What I Watched the Kindo Agent Do

I hit start. Then the agent took over, and I just watched.

The agent starts by reading a Linear ticket. The ticket outlined the Brickstorm threat, linked to Mandiant's research, provided Mandiant's GitHub repo for their scanner, and flagged a specific software-defined router (SDR) deployed in AWS that needed immediate scanning.

Step 1: Establish Remote Access

Mandiant's scanner must run locally on the target device. You can't scan a router from your workstation. The Kindo agent SSH'd into the AWS-deployed router. Autonomously. When it discovered that sshpass wasn't available (needed for non-interactive authentication), it installed it.

Wait. This isn't a click-through wizard. This is an AI agent assessing a remote system, identifying a missing dependency, and resolving it without being told how.

Step 2: Upload and Execute the Scanner Locally

The agent downloaded Mandiant's Brickstorm scanner, then uploaded it to the remote SDR. Then it executed the scan directly on the appliance, exactly as Mandiant's documentation requires. No shortcuts, no hallucinating a remote scan that wouldn't actually work. The proper way.

And then the scan came back positive. Compromise detected.

Brickstorm is confirmed.

Okay. Now it gets interesting.

Step 3: Consult the Runbook

The agent pulled the remediation runbook from GitHub. Two versions exist: one human-readable, one JSON-formatted for LLM consumption. AI generates the agent version from the human original.

This is real AIOps.

The agent follows the runbook's first instruction: post an urgent alert to the team's Slack channel while analysis continues.

Step 4: The Part That Shouldn't Be Possible Yet

The agent started working through the analysis phases. Downloaded the malware to an isolated sandbox. Performed static analysis and decompilation.

Then it delivered the verdict: ⚠️ 100% MALICIOUS - NO LEGITIMATE FUNCTIONALITY.

Not a compromised legitimate daemon. A completely fabricated backdoor. 362 bytes. Invalid architecture field. Contains secp256k1 elliptic curve constants (cryptocurrency-grade crypto). Attribution: China-nexus APT, high confidence.

The agent generated IOCs for threat intelligence sharing. MD5 and SHA256 hashes. Identified capabilities: C2 communications, data exfiltration, persistence mechanism.

Then it did something I didn't expect. It evaluated three remediation options and built a clean replacement daemon. Not just recommended it. Actually created the binary. Benign bash script that logs any future execution attempts, provides no network access, exits cleanly.

Risk assessment: minimal deployment risk, no service impact, rollback available. Two deliverables ready: complete analysis report and the clean replacement binary with SHA256 verification.

Then came the handoff:

This wasn't "approve yes/no." The agent delivered a complete security briefing: definitive malware confirmation, threat intelligence attribution, IOCs, clean daemon specifications, and professional-grade deliverables ready for download. Then it asked for deployment approval with full context.

The agent didn't auto-deploy. Network infrastructure is fragile. A misconfigured router binary takes down production traffic. The blast radius of "we automated the fix and it went wrong" is catastrophic. Human-in-the-loop isn't about doubting capabilities. It's about acknowledging that remediation actions have consequences.

The agent then posted the analysis summary to Slack and updated the Linear ticket with the full investigation results.

An AI agent just autonomously executed a multi-phase security runbook, performed binary analysis and malware attribution, evaluated remediation strategies, and stopped at exactly the right point for human decision-making. Eight minutes from scan detection to deployable fix with complete context.

I don't know of another AI automation platform that can do this. The combination of remote access, binary analysis, security reasoning, and knowing where to stop for human approval. That's what makes this genuinely novel.

What This Actually Means

Here's what struck me watching this: network appliances have been security blind spots for years. They're "infrastructure that's infrastructure" until something goes catastrophically wrong. And attackers know this. That's why Brickstorm targets them.

But an agent that can SSH into these devices, run detection tools locally, and analyze the results? That transforms blind spots into monitored assets.

Now think about F5. They're coordinating emergency remediation across thousands of enterprise customers. Each customer needs to scan their BIG-IP appliances, analyze compromised binaries, determine remediation approaches, deploy fixes. Manually. At scale. Under emergency directive timelines.

What I just watched took eight minutes. One appliance, full detection to deployable remediation. No specialized security analyst. No late-night SSH sessions. No tribal knowledge about which commands work on which firmware versions.

That's not an incremental improvement. That's a fundamental change in what's possible. When the constraint isn't analyst time or specialized expertise anymore, when detection and remediation compress from weeks to minutes, the entire threat model shifts. Attackers lose their dwell time advantage. Network appliances stop being blind spots.

Runbooks aren't new. But AI-optimized versions in JSON format, designed specifically for LLM consumption? That transforms documentation into executable automation. The agent doesn't just follow instructions. It interprets structured workflows optimized for machine reasoning.

And that 393-day dwell time Brickstorm averages? That exists because detection is hard and remediation is harder. When you compress "identify, scan, analyze, remediate" from weeks down to minutes, the attacker's advantage changes. Not eliminated. Attackers adapt. But fundamentally changed.

The Honest Reality Check

Mandiant's scanner detects specific Brickstorm variants based on known indicators and signatures. It won't catch every variant. Novel attack patterns, mutations, or zero-day exploitation will slip past signature-based detection. The agent can execute the scanner and analyze results, but the scanner itself has blind spots.

And about that eight-minute runtime? That's per device. Organizations with dozens of network appliances would see that time compound when scanning serially. You could rewrite the runbook for parallel execution, but that's a different workflow design with its own complexity.

Production environments aren't always like this. Legacy systems with non-standard configurations. Runbooks that are partially documented or wrong. Tools without APIs. Detection scenarios where no scanner exists.

Agents aren't magic. They're automation with reasoning capabilities. When the underlying systems are well-architected, agents excel. When the infrastructure is a mess (tribal knowledge, inconsistent tooling, undocumented procedures), agents struggle the same way junior engineers struggle.

The constraint isn't the AI anymore. It's the infrastructure around it and the documentation (or lack of).

What I Keep Thinking About

The gap between where we are and where this leads is narrower than it seems.

If an AI agent can detect and remediate sophisticated nation-state malware on network appliances, the next steps become obvious: compliance scanning that actually runs continuously across heterogeneous infrastructure. Threat hunting that scales to thousands of systems simultaneously. Incident response runbooks that learn from every execution and improve themselves.

But here's what keeps coming back to me: we've spent years accepting that certain security tasks are just hard. Scanning network appliances? Too difficult, too manual, requires specialized access. Binary analysis? Specialized skill, not every analyst can do it. Runbook execution? Hope the documentation is current and the on-call engineer remembers the edge cases.

And then you watch an agent do all of it in eight minutes.

The question isn't whether the technology works. I just watched it work. The question is whether security teams will reorganize around it. Whether we'll fix our infrastructure to make it automatable. Whether we'll document our procedures well enough that an agent (or a junior engineer) can actually execute them.

That's the harder part. Not the AI. The organizational change.

The Real Lesson from Brickstorm

Brickstorm isn't just a malware family. It's a case study in what happens when attackers understand your blind spots better than you do.

Network appliances have been invisible to traditional security tools for years. Attackers have been exploiting this gap for just as long. Brickstorm made it obvious with that 393-day dwell time. The F5 breach made it critical.

The technology to close this gap exists now. Eight minutes from detection to remediation. While F5 and their thousands of customers coordinate manual emergency response across entire infrastructures, this capability already exists. Available as SaaS or deployable as a self-managed instance within your infrastructure.

The question is whether security teams will adopt it. Whether enterprises facing CISA emergency directives will recognize that manual remediation at scale isn't sustainable. Whether we'll stop treating network appliances as infrastructure we can't monitor and start treating them as systems we can secure.

Because right now, attackers get 393 days of undetected access. They steal source code and gain 12-month head starts. They target the blind spots we've accepted for years. And that's only going to change if we change how we approach the problem.

I'm curious what you're seeing in your environment. Are your network appliances monitored? If you're running F5 BIG-IP infrastructure, how are you handling the emergency directive? If you needed to scan for Brickstorm today, how long would it actually take? What's the constraint: access, tooling, expertise, something else?

And if you've experimented with AI agents for security automation, what worked? What broke? The real learning happens when we share not just the success stories but the messy reality of making this stuff work under pressure.

Because that's ultimately what we're all doing: figuring out what actually works when the CISA directive lands and you need answers fast. Contact us to get started on your AI security and technical operations transformation.