Week 12: Incident Response Is Half Politics

Most incident response plans assume clean timelines and clear answers. Real incidents are messier—shaped by uncertainty, executive pressure, incomplete data, and human dynamics that matter as much as technical skill.

You’ve planned for incidents. You have a documented incident response plan. You’ve done tabletop exercises. Your team knows their roles. You have runbooks for common scenarios.

Then an actual incident happens, and you discover that the plan didn’t account for half of what actually matters.

Because incident response isn’t just technical. It’s organizational, political, and human. You’re not just trying to contain and remediate a security issue—you’re managing executive panic, communicating with stakeholders who don’t understand security, making decisions with incomplete information under time pressure, and documenting everything for the inevitable post-incident review.

The technical part is hard. The organizational part is often harder. And if you’re not prepared for both, you’re going to struggle even if your technical response is solid.

What Actually Happens During Incidents

Your incident response plan probably has clean steps: detect, contain, eradicate, recover, lessons learned.

Real incidents are messier.

You detect something that might be an incident or might be normal but anomalous activity. You don’t know which yet. You need to investigate without making assumptions.

You start investigating and realize you don’t have the logs you need. Or the logs you have don’t go back far enough. Or the thing you’re investigating happened in a system you don’t have good visibility into.

You think you’ve contained it, but then you find evidence that the attacker had access earlier than you thought. Or broader than you thought. So now your containment boundary was wrong and you have to expand it.

You’re trying to eradicate the threat, but you’re not entirely sure you’ve found all the persistence mechanisms. How long do you search before you’re confident enough to say it’s gone?

You’re trying to recover, but business stakeholders are pressuring you to restore systems quickly, and you’re trying to balance speed against the risk that you haven’t fully remediated.

None of this is clean. All of it involves judgment calls with incomplete information. And all of it is happening while people are watching and asking questions and wanting answers you don’t have yet.

Managing Executive Attention

Executives care when there’s an incident. Suddenly you have attention from people who normally aren’t involved in security operations. This is both helpful and challenging.

Helpful because you might get resources you wouldn’t normally get. Authority to make decisions quickly. Budget for emergency response. Organizational cooperation that would usually take weeks to coordinate.

Challenging because executives want answers and certainty, and you often don’t have those yet. They want to know: What happened? How bad is it? When will it be fixed? Are we going to have to notify customers? What’s this going to cost?

And your honest answers are often: We don’t know yet. We’re still investigating. It could be anywhere from minor to severe. We can’t estimate time to resolution until we understand the full scope. We’ll know about notification requirements when we know what data was accessed.

That’s not satisfying. But it’s honest. And giving false certainty is worse than admitting uncertainty.

What helps:

Regular updates. Even if you don’t have new information, update stakeholders on what you’re doing. “We’re still analyzing logs from the authentication system. We’ve ruled out X, we’re investigating Y, we expect to have more information in two hours.”

Translate technical findings into business impact. Don’t just say “we found lateral movement.” Say “the attacker accessed multiple systems, including ones that contain customer data. We’re working to determine what specific data was accessed.”

Set expectations about timelines. If investigation is going to take days, say so. Don’t let executives think this will be resolved in hours just because you don’t want to give bad news.

Be honest about what you don’t know. “We don’t know yet” is a legitimate answer. It’s better than speculating or giving false assurance.

Have a single point of contact for executive communication. Multiple people giving updates creates confusion and inconsistent messaging. Designate one person to communicate with leadership.

The Notification Decision

One of the most fraught decisions during an incident is whether you’re required to notify customers, regulators, or the public.

This isn’t just a security decision—it’s a legal and business decision. And it needs to be made carefully, with input from legal counsel.

But security has to provide the information that drives that decision. What data was accessed? How many people are affected? What’s the evidence for and against data exfiltration?

The pressure is to minimize. “We don’t have evidence that data was exfiltrated, so maybe we don’t need to notify.” But absence of evidence isn’t evidence of absence. If the attacker had access and you don’t have comprehensive logging, you might not have evidence even if exfiltration occurred.

The conservative approach is to assume the worst case unless you have evidence otherwise. If the attacker had access to customer data and you can’t definitively rule out exfiltration, you probably have to notify.

This creates tension with business stakeholders who want to avoid notification because of the cost and reputational damage. Your job is to provide accurate information about what you know and don’t know, and let legal and executive leadership make the decision.

But you have to be clear about the uncertainty. If you say “we don’t think data was exfiltrated” and they decide not to notify based on that, and then you later find evidence that it was—that’s a problem. Be precise about what you know, what you don’t know, and what the evidence supports.

Documentation Under Pressure

You’re supposed to document everything during an incident. Timelines of actions taken, decisions made, evidence collected. This is critical for post-incident analysis and potential legal or regulatory proceedings.

In practice, when you’re in the middle of an active incident and everyone’s working frantically, documentation often slips. People forget to log what they did. Decisions get made verbally and nobody writes them down. Evidence gets collected but the chain of custody isn’t properly documented.

This is understandable but problematic. After the fact, when you’re trying to reconstruct what happened, incomplete documentation makes that much harder.

What helps:

Designate someone as scribe. One person whose job during the incident is to document what’s happening. Not doing technical work—just capturing the timeline, decisions, and actions.

Here’s a recommendation: if your organization is big enough and the incident grows beyond initial response, get an executive admin or a business analyst from the PMO to help with this. If you force one of your technical team members to be the scribe, they’ll resent being pulled off technical work when their skills are needed elsewhere. But someone who’s good at taking notes and asking clarifying questions can be invaluable here.

You’re probably already hours or even days into the incident before you realize you need dedicated documentation support. Once you get that person, take an hour or two to backfill. Go over what happened in the last few hours or days and reconstruct the timeline together. It takes time, but it’s worth it—especially if there’s eventual legal or regulatory scrutiny.

Use a shared document or chat channel for incident updates. Something where everything is automatically logged and timestamped. This creates a timeline even if nobody’s actively maintaining documentation.

Document decisions with rationale. Not just “we decided to isolate the server” but “we decided to isolate the server because we found evidence of data exfiltration and needed to prevent continued unauthorized access.”

Preserve evidence properly. If you’re collecting logs or taking disk images or capturing memory dumps, document chain of custody. This matters if there’s ever legal action.

Don’t destroy evidence accidentally. Rebuilding a compromised system cleans up the evidence of how it was compromised. Make sure you’ve collected everything you might need before you wipe and rebuild.

The Communication Challenge

You’re going to be communicating with different audiences who need different information.

Technical team: Detailed technical information. IOCs, attack techniques, affected systems, remediation steps. They need enough detail to do their jobs.

Executive leadership: Business impact. What systems are affected, what’s the impact to operations, what’s the potential for customer or regulatory notification, what resources are needed, what’s the timeline.

Legal counsel: What data was potentially accessed, what evidence you have, what gaps in visibility exist, what regulatory requirements might apply.

Affected users or customers (if notification is required): What happened, what data was potentially affected, what you’re doing about it, what they should do, how they can get more information.

Each audience needs different levels of detail and different framing. Explaining attack techniques to executives wastes time. Giving customers vague reassurances without specific information frustrates them.

Tailor your communication to the audience. And make sure the messages are consistent—you can’t tell executives one thing and customers something contradictory.

The Blame Dynamic

When something bad happens, people want to know whose fault it is. This is often counterproductive during incident response.

Yes, maybe someone clicked a phishing link. Maybe someone misconfigured a system. Maybe someone disabled a security control that would have prevented this.

But during active response, blame doesn’t help. It makes people defensive. It makes them less likely to come forward with information. It creates an environment where people are more worried about protecting themselves than solving the problem.

And here’s a critical reason to avoid premature blame: you often don’t have the full picture yet.

I’ve worked an incident where we detected two or three credentials being used regularly during the attack. The initial reaction from some stakeholders was to identify and confront those users. But we held off. Through investigation, we were able to confirm that two of those people had their passwords compromised—keylogger, credential stuffing from a breach, something along those lines. They weren’t involved; their credentials were just stolen and used by the attacker.

If we’d blamed those people early and pushed for immediate termination, we could have gotten innocent people fired. One of the accounts we could never definitively determine—whether it was willing participation or another compromised credential. My gut says compromised, but we couldn’t prove it the same way we did with the others.

Point is: during an active incident, you don’t always know who did what or whether apparent insider activity is actually an insider or just stolen credentials. Making it about blame before you have facts creates injustice and destroys trust.

Save the accountability discussion for after the incident is resolved. During the incident, focus on fixing the problem.

This requires discipline from leadership. If executives start demanding to know who’s responsible while the incident is still active, that needs to be redirected. “We’ll do a full post-incident review to understand what happened and how to prevent it in the future. Right now we need everyone focused on response.”

Blameless post-mortems are a cultural practice worth adopting. Understand what happened, what contributed to it, what can be learned, how to prevent it in the future—without making it about punishing individuals. This creates an environment where people are more honest about mistakes and near-misses, which makes the organization more resilient.

When the Plan Doesn’t Fit

Your incident response plan probably covers common scenarios. Malware infection. Phishing compromise. DDoS attack. Unauthorized access.

Then you get an incident that doesn’t fit any of those patterns. Or fits multiple patterns. Or involves systems or attack techniques your plan didn’t anticipate.

Here’s a structural recommendation: you need an overarching incident response framework—the generic process that applies to any incident—and then specific playbooks underneath it for common scenarios.

The framework covers the principles: detect, contain, investigate, eradicate, recover, document. The decision-making process. The communication structure. The escalation paths.

The playbooks cover specific scenarios: “user clicked phishing link,” “DDoS in progress,” “ransomware detected.” Step-by-step guidance for that particular situation.

But here’s the problem with overly prescriptive plans: real incidents don’t stay in neat categories. You might have an incident that involves phishing, credential compromise, and malware. Which playbook are you following? All of them? And if you try to put every possible notification scenario and every regulatory obligation into a single incident response plan, you end up with a 200-page document that nobody will actually use during a crisis.

So keep the framework generic enough to be useful regardless of the specific incident type. Use playbooks for common patterns but understand they’re guidance, not rigid scripts. The plan is a starting point, not a script. You still have to adapt to what you’re actually seeing.

This is where judgment and experience matter. Understanding principles (contain the threat, collect evidence, minimize impact) rather than just following procedures. Being able to make decisions when the playbook doesn’t give you an answer.

And being willing to escalate when you’re out of your depth. If the incident involves sophisticated techniques you don’t have experience with, bring in help. That might be external incident response consultants. That might be specialists from vendors. That might be law enforcement if there are criminal implications.

One important note about law enforcement: they’re not there to do your forensics or incident response. If someone committed a crime, they’ll build a case—but only if they believe they can prosecute. Their priorities and timelines are different from yours. They can be valuable partners, but don’t assume they’ll solve your incident for you. You still need your own response capability.

Knowing when you need help is itself a valuable skill.

The Recovery Pressure

During an incident, there’s pressure to restore normal operations as quickly as possible. Every hour that systems are down costs the business money. Users can’t do their jobs. Customers can’t access services.

This creates tension with thorough remediation. To be confident you’ve removed the threat, you need time to investigate, clean compromised systems, verify that persistence mechanisms are gone. Rushing this means potentially missing something and having the attacker return.

But business stakeholders want systems back up. They want to know why it’s taking so long. They’re weighing the cost of continued downtime against the risk of incomplete remediation.

Sometimes the right answer is strategic shutdown—taking systems offline deliberately to enable proper containment.

Early in my career, I was working on a resource management team—basically server admins for a divisional office. We were fighting a worm—I can’t remember the exact name 25+ years later, but I remember it was incredibly annoying. We’d clean one system, move to the next, then the next—and before we could finish, the first system would be reinfected. Cat and mouse. Whack-a-mole.

Finally I came up with an idea: “Boss, I’m taking down the network for 45 minutes.”

“WHAT? NO!”

“We can’t get ahead of this worm. If we take the ring down”—yes, I’m old, it was a Token Ring network—”the worm can’t move while we eradicate it. We have a very efficient cleanup process. The problem is the worm moves during the process.”

We took the network down. Cleaned every system systematically while the worm couldn’t propagate. Brought it back up clean. Problem solved.

The lesson: sometimes you need to create the conditions for successful remediation, even if that means deliberate downtime.

But sometimes the right answer is strategic patience—not shutting things down immediately so you can ensure you’ve found everything.

Years later, I was working for a retailer. We’d been in incident response for weeks after getting alerts from the card brands about compromised payment cards. We finally found something—confirmed the compromise, started notifications, and identified a system that was actively exfiltrating data. We repositioned sensors and monitoring to watch it.

During an update call, an executive demanded to know why we hadn’t shut the system down immediately. I explained that while we’d found one command-and-control server, we couldn’t prove a second one didn’t exist. At that point we’d already lost tens of thousands of cards. Another day with maybe 5,000 more cards exposed wasn’t going to fundamentally change the impact, but it could help us verify we’d found everything.

The executive essentially kicked and screamed to shut it down now. But we held the line. We wanted to watch the next exfiltration—what the attacker touched, what commands they issued, just to be certain we had the full picture.

It paid off. During the next data exfiltration, the attacker sent a ping to a system we hadn’t suspected. We grabbed a forensic image, quickly analyzed it, and verified it was a silent secondary C2 server that we would have missed if we’d shut down the first one immediately.

Then we took both systems offline simultaneously and cut off the attacker’s access completely. We monitored for three days. Not one reconnection attempt. Not one similar pattern. Clean containment.

If we’d shut down the first C2 when the executive demanded, the attacker would have still had access through the second one. We’d have thought we were contained, restored operations, and the breach would have continued.

The lesson: sometimes you need patience to ensure complete containment, even when stakeholders are demanding immediate action.

Your job is to be clear about the trade-offs. “We can restore this system now, but we haven’t fully verified that all malware is removed. If we restore it and the attacker still has access, we might be back in the same situation.” versus “We need another six hours to complete analysis and be confident in the remediation.”

Sometimes leadership will accept the risk of faster restoration. That’s their call if they understand what they’re accepting. But they need to understand it clearly—and sometimes you need to make the strategic call, whether that’s taking systems down to enable cleanup or keeping them running to ensure you’ve found everything.

The Post-Incident Review

After the incident is resolved, you need to do a proper post-incident review.

What happened? How was it detected? How long did response take? What worked well? What didn’t work? What would we do differently? What changes do we need to make to prevent similar incidents or respond better next time?

This is where you capture lessons learned and turn them into improvements. It’s also where you document the incident fully for future reference.

Be honest in this review. If something didn’t work, say so. If someone made a mistake that contributed to the incident, document that without making it personal. If you got lucky and the impact could have been worse, acknowledge that.

The goal is learning, not blame. The goal is making the organization more resilient, not making people feel bad about what went wrong.

And actually implement the improvements that come out of the review. Too many post-incident reviews result in great recommendations that never get acted on. If you’re going to take the time to document lessons learned, follow through on them.

Practical Takeaways

Incident response is organizational and political, not just technical. Plan for both.

Real incidents are messier than tabletop exercises. You’ll make decisions with incomplete information under time pressure.

Manage executive communication carefully. Regular updates, translate technical to business impact, be honest about uncertainty.

Notification decisions are legal and business decisions. Provide accurate information about what you know and don’t know.

Document everything during the incident. Designate a scribe, use shared timelines, document decisions with rationale.

Tailor communication to different audiences. Technical detail for responders, business impact for executives, clear information for affected parties.

Avoid blame during active response. Save accountability discussions for post-incident review.

Plans are starting points, not scripts. Be prepared to adapt to incidents that don’t fit the playbook.

Balance recovery pressure with thorough remediation. Be clear about trade-offs and risks.

Do proper post-incident reviews and actually implement the improvements. Turn incidents into learning opportunities.

Podcast: Download (Duration: 21:23 — 11.6MB) | Embed

Subscribe to the Cultivating Security Podcast Spotify | Pandora | RSS | More

Subscribe to be notified when we publish new content!

Support this work

If you liked this and want to support more analysis like it, consider buying me a coffee.