13 March 2026

Half prepared is the most dangerous position

In this series, we'll be discussing operational resilience across a range of industries and IT disciplines. Check out the other articles here and here.

You know the moment I’m talking about.

It’s 9.02 on a Tuesday morning. An alert lights up your phone. Your COO is in your inbox. Your CISO says “priority incident.” Your operations lead needs clarity. Your board wants a straight answer.

You are expected to respond before you even know all the facts. That pressure doesn’t come from lack of planning. It comes from reality proving when your plans haven’t been tested.

Preparedness isn’t about what you have in place. It’s about what you can do when things break.

It’s about how fast you spot the issue. How clearly you decide what matters first. How quickly you restore control. And how confidently you can explain every choice afterwards to your board, your regulator and your customers. Your board wants to know:

Did you do enough?

Could you have done more?

Could you have done it differently?

If you are accountable when things go wrong, you already know this. And if your cloud, security and network practices aren’t coordinated for that moment, you’re not as prepared as you might think.

Preparedness is judged under pressure

You’ve created response plans. You’ve mapped services. You’ve defined recovery time objectives. You may even run tabletop exercises regularly. You have dashboards and runbooks. All of that tells you what should happen.

That’s good. In fact, that’s what separates organisations that can demonstrate control from those that can’t.

But the real test happens when something fails for real, in the middle of peak demand, under customer load, or right before a regulatory deadline.

Planning gives you comfort. Live incidents give you scrutiny.

In those moments, the question even if you have invested in resilience, the question is whether you can regain control quickly and, crucially, explain why the decisions you made were right.

Response speed has become a governance signal

Response time used to be something only IT cared about. That’s over.

Today, regulators and boards treat the speed at which you detect, contain and recover from an incident as evidence of operational control.

In regulated markets like UK financial services, firms were mandated by the Financial Conduct Authority (FCA) to prove they can deliver important services within defined impact tolerances, tested and evidenced, by the end of March 2025.

Across the EU, the Digital Operational Resilience Act (DORA) formalised expectations for ICT risk management, incident reporting and operational resilience for financial entities from January 2025. These frameworks make response speed and recovery clarity part of what leaders are accountable for when scrutiny arrives after an incident.

The FCA itself has said firms should consider metrics beyond time alone when defining tolerances. In other words, recovery is not just about restoring infrastructure. It’s about preventing intolerable harm. Across the EU, DORA reinforces similar principles: firms must ensure they can withstand, respond to and recover from ICT disruptions in a way that protects financial stability and customers.

If you are facing regulators and boards who ask “how fast did you detect this?” and “how did you prioritise decisions?”, response speed is now a business signal of control.

The hardest questions arrive after it feels “over”

The real scrutiny doesn’t happen during the outage. It happens once services are back online and you think the crisis is over. The board isn’t asking for a technical debrief. They’re testing control, judgement, and governance.

The questions sound more like this:

When did we first know and when should we have known?

Were we in control at every stage?

Did our escalation thresholds work?

Did we follow our incident response plan and if not, why not?

Who had the authority and decision rights during the crisis?

Was employee and customer harm minimised quickly enough?

Was this risk previously identified or accepted?

Had we tested this scenario?

What would a regulator say if they reviewed our actions today?

Boards are less concerned with the mechanics of how the problem was fixed. They want to understand whether the organisation behaved in a structured, predictable and well governed way under pressure.

Regulators take a similar view. They will look at:

Whether impact tolerances were breached

Whether the incident was reported on time

Whether scenario testing had covered this failure mode

Whether third-party oversight was adequate

Whether lessons identified in previous incidents had actually been implemented

In other words, they are assessing whether resilience exists in practice, not just on paper.

A clean narrative builds confidence:

Clear timeline

Clear decision making rationale

Clear ownership of actions

Evidence of testing and follow through

A messy narrative, with inconsistent timelines, unclear accountability, gaps between policy and reality, weakens it.

The incident itself rarely damages trust as much as a loss of control does.

When services are restored, the technical crisis may be over. The governance test is just beginning.

The real risk is fragmentation, not missing tools

Some organisations lack the tools and technology needed, either to keep their network and cloud environments stable and resilient, or to detect and contain cyber-attacks. For other organisations however, the problem is not lack of tooling, but the gaps between teams, platforms and suppliers, exactly the places that planning glosses over.

Cloud recovery that isn’t aligned with network readiness

Your cloud team may have a recovery plan that assumes connectivity is stable. During major disruption, the network’s first job is to stabilise traffic flow in order to keep critical services reachable. That may mean prioritising live trading, customer access or core business systems before large-scale recovery traffic begins.

That’s not a network failure. That’s resilience in action.

But it does mean cloud and network teams have to design and test recovery together.

If your cloud footprint has expanded by 30% and that change hasn’t been factored into network capacity planning, a failover test may technically “work” while performance degrades under real load. If edge cases aren’t written into playbooks and rehearsed during testing, the recovery path becomes a live experiment at the worst possible moment.

Recovery speed in those moments isn’t dictated by a cloud SLA alone. It’s shaped by communication, capacity planning and whether cross-domain assumptions have been pressure-tested before an incident ever occurs.

Backups are often scheduled overnight for a reason. Bandwidth is finite. During live incidents, restoring service and restoring infrastructure are two related but distinct priorities, and they must be balanced deliberately.

Security response that slows restoration

Security controls are essential. The issue is not protection, it’s coordination.

In a ransomware scenario, for example, Endpoint Detection and Response tools may automatically quarantine infected hosts and terminate unknown processes. That can include backup agents and their network connections.

Now the recovery team needs to decide: do we temporarily relax certain restrictions to allow restoration to begin? If the SOC disables VPN access to contain attacker movement, but the cloud backup console or jump servers required for restoration rely on that VPN, progress stalls until coordination happens.

In these moments, recovery slows not because controls are wrong, but because cloud, security and network responses were not fully aligned in advance.

The network’s quiet role in resilience

Connectivity often operates seamlessly in the background. When everything works, no one notices it.

But during peak demand or system recovery, observability and monitoring tools ensure the business can respond quickly, prioritising network traffic to keep critical services running. Resilient network design considers both expected and peak traffic volumes, so that trading, customer access and recovery activity can coexist without creating unnecessary friction.

Building resilience into your network before an incident is a preventative decision. It means that when disruption hits, you are not scrambling to create capacity or visibility. You already have the headroom and prioritisation logic in place to support fast, controlled recovery.

The risk isn’t weak networks. It’s underestimating their role in coordinated recovery.

This is why being half prepared is dangerous. Each area can look mature on its own. But if cloud, security and networks are not designed to operate together under pressure, you don’t actually have coordinated resilience. You have hope.

If your resilience strategy falls apart when cloud, security and networks collide, you don’t have resilience. You have optimism.

Pressure exposes coordination gaps you don’t see in planning

Real incidents reveal operational weaknesses that untested plans do not:

Unclear ownership

When a major incident crosses cloud, security and network boundaries, who has authority to sequence decisions? Who signs off on risk trades? Who owns communication back to the board?

If escalation paths aren’t crystal clear, valuable time is lost aligning internally before action is taken externally.

Risk traded for speed

Under pressure, trade-offs are sometimes necessary.

You might temporarily reduce EDR strictness so backup agents can reconnect and begin restoration. That accelerates recovery, but reduces detection coverage during that window. Attackers understand this pattern. They often exploit the immediate aftermath of an incident, knowing defences may be partially relaxed.

That decision may be justified. But it must be deliberate, documented and time-bound. More haste, less speed; without coordination, speed increases exposure.

Temporary fixes that become permanent problems

Quick restorations can create residual risk if not tightly managed.

Firewall ports may be opened temporarily to accelerate system rebuilds. If closure is not formally tracked and verified, they remain open.

Emergency administrative privileges may be granted to speed recovery. If those rights are not revoked promptly, access risk persists.

A WAF rule or DLP control may be relaxed to restore customer access. If forgotten, that window of vulnerability can outlast the crisis itself.

Without structured post-incident clean-up, recovery actions can introduce new weaknesses.

Data breach costs make the stakes real

When security response, detection and recovery lag, real costs follow.

The global average cost of a data breach reached USD 4.88 million in 2024, according to IBM. But speed makes a measurable difference.

Organisations that identified and contained breaches in under 200 days saved an average of USD 1.76 million compared to those that took longer.

For financial firms, breaches can be even more expensive because they face additional regulatory and reputational fallout.

The bottom line is slow detection, delayed containment and uncoordinated recovery do not just extend an outage; they amplify cost and scrutiny.

Resilience doesn’t mean stopping every failure

True operational resilience is what happens when failure arrives anyway and you still respond clearly and confidently.

It looks like:

Clear ownership across cloud, security and networks
No ambiguity when pressure hits.
Recovery pathways that work under real conditions
Not just diagrams you refer to on slide decks.
Confidence built on tested coordination
Not assumed capability.
The ability to show exactly what happened and why
With timelines, decisions and rationale you can defend to a board or regulator.

Coordinated ways of working make that easier. When people and platforms operate as one, response is faster, cleaner and easier to explain when scrutiny arrives.

Many organisations worry that vendor consolidation creates single points of failure.

In reality, fragmentation often creates response friction. Multiple suppliers, separate escalation paths and unclear accountability introduce delay at the exact moment clarity is needed.

Consolidation done properly, with joined-up governance across cloud, security and connectivity, can reduce handoffs and accelerate coordinated recovery.

The question isn’t vendor count. It’s whether someone owns the end-to-end response.

The danger of assumed readiness

The danger isn’t imperfection, it’s assumed readiness.

Confidence based on historic testing that no longer reflects current complexity. Metrics that look strong individually but haven’t been tested together recently. Recovery objectives that technically meet SLAs but sit uncomfortably close to impact tolerances.

Assumed readiness creates false confidence.

It slows decisions. It invites scrutiny. It weakens trust. And it turns recovery into negotiation among teams, tools and suppliers, exactly when negotiation is the last thing you need.

The question you actually need to ask

You might be confident that you’re ready for such shocks. But how can you know for sure? Have you tested it?

Pressure-test your resilience before pressure forces the issue.

Don’t ask “do we have the right tools?”

Ask this instead:

Can we detect, decide and recover quickly across cloud, security and networks, and demonstrate, with clear evidence, that we limited harm and strengthened our defences as a result?

If you cannot answer that with confidence, you are not as prepared as you think.

Before regulators, boards and customers test your resilience, test it yourself.

We work with CIOs, CISOs and operations leaders to pressure-test recovery across cloud, security and networks together, not in isolation. If you want build resilient, in IT that holds up under today's threats and keeps your business running at top speed, talk to us.