Maximum Tolerated Downtime: how IT recovery speed impacts trust, revenue and compliance
This series of blogs covers the impact of IT resilience and what companies can do to weather business disruption. Read the other blogs here and here.
You have recovery time objectives. You track mean-time-to-detect and mean-time-to-respond. Your teams document failover tests. You probably have disaster recovery rehearsals and hot-site simulations on the calendar.
On paper, recovery looks sound. So why does it still feel uncomfortable when a real incident hits? Because what’s changed isn’t the importance of recovery speed. It’s how that coordination and efficacy is judged.
You probably measure response time. But do you measure it the way you’re judged?
Most enterprises track post-incident recovery through domain-specific metrics:
Cloud teams track Recovery Time Objectives (RTO) and Recovery Point Objectives (RPO).
Cybersecurity teams track Mean-Time-to-Detect (MTTD) and Mean-Time-to-Respond (MTTR).
Infrastructure teams track uptime, SLA adherence and failover test results.
Individually, those metrics are sensible. They align to technical responsibilities and supplier management. They often map into your risk register and reporting framework, and should all be translatable to an overall metric – MTD, or Maximum Tolerated Downtime.
The challenge is not that these metrics are wrong, but that boards and regulators assess recovery end-to-end.
They don’t ask whether cloud hit its RTO while security was still containing. They don’t care if your infrastructure team met their uptime SLA while business services remained offline. They look at the full range of impact on customers, revenue and business continuity.
For regulated financial services firms in the UK, this distinction is formalised. Under the Financial Conduct Authority’s Operational Resilience framework, firms must define impact tolerances for important business services, the maximum level of disruption that can be tolerated before harm becomes unacceptable.
Recovery time objectives are expected to sit well within those declared tolerances.
That requirement applies specifically to FCA-regulated firms. But the underlying principle is broader. Whether you are regulated by the FCA or not, the concept is the same: recovery is not judged at the level of infrastructure. It is judged at the level of business service and real-world impact.
In financial services, that expectation is codified in regulation. In other sectors, it is increasingly embedded in board oversight, audit scrutiny and customer expectation.
The FCA itself has noted that impact tolerances are often primarily time-based and should be complemented with additional metrics such as customer impact, transaction values and potential losses.
In other words, recovery is no longer judged solely on whether you restored a particular technology within an SLA. It is judged on whether you remained within business-level tolerances.
Speed is necessary. It is no longer sufficient.
Let’s be clear. Speed still matters.
The Information Commissioner’s Office requires organisations to report certain personal data breaches within 72 hours of becoming aware of them. Regulators do consider response timeliness when assessing enforcement outcomes.
But speed without clarity or coordination can create new problems.
Restoring a system quickly without understanding dependencies can reintroduce security vulnerabilities. Containing a threat aggressively without considering business continuity can prolong downtime. Declaring “service restored” before you’ve validated its integrity and stability can leave you technically back online but still vulnerable to repeat failure.
Recovery speed is judged alongside:
Real-world impact on customers or users
Control and coordination
Why you made the decisions you did
Evidence of cybersecurity testing
Alignment to Maximum Tolerated Downtime or declared impact tolerances (for financial organisations)
That is where many enterprises feel the burn.
Recovery speed only counts if you can prove it held up under real pressure.
Where enterprise recovery falls short of scrutiny
In practice, most large organisations do have cross-domain reporting. Risk registers aggregate cloud, cyber and infrastructure metrics. Controls are documented. Testing is logged.
The misalignment tends to surface in three places.
- Metrics age faster than you think
Banks may rehearse disaster recovery annually or more frequently. Hot-site failovers are tested and documented. But in complex estates, architecture evolves faster than testing cycles.
If you pressure-tested a recovery scenario twelve months ago, does it still reflect your current cloud footprint, vendor landscape and security tooling?
For financial institutions especially, response capability can become outdated within months if it is not re-tested regularly.
Recovery looks fast in one domain and slow in another
Your cloud RTO may be four hours. Your security MTTR may be measured in days depending on severity. Your network failover may activate within minutes.
Each metric can technically be “met” while the overall business service remains degraded or offline.
That misalignment is rarely visible in siloed dashboards. It becomes very visible in a board review.
KPIs satisfy operational teams but fail board scrutiny
An SLA might state that a cloud environment can be restored within 24 hours. But is that restoring the entire infrastructure, or restoring just the data?
A 24-hour RTO on paper can translate into weeks of operational disruption if rebuilding environments, validating integrations and clearing transactional backlogs are not factored into the scenario.
The board doesn’t distinguish between “infrastructure restored” and “business fully operational.” It sees impact.
Finance vs retail: different pressure, same metric shift
In financial services, this alignment between recovery objectives and impact tolerances is under direct regulatory observation.
Impact tolerances are declared in advance and audited every three years by the FCA. Recovery time objectives must sit comfortably within your declared impact tolerances.
The FCA has been clear that impact tolerances should consider not just time but customer type, transaction values and potential harm, and that where impact tolerances are likely to be breached, financial organisations must have mitigating procedures in place. Recovery metrics must support this broader lens.
In retail, you might have to deal with just one regulator – the ICO – but the commercial pressure is immediate. A slow recovery during peak trading translates directly into lost revenue and reputational damage.
In both sectors, the board ultimately asks the same questions:
How long were critical business services down?
How much did we lose?
Could it have been avoided?
Did we take reasonable and necessary steps to prevent that?
From metric to operating model
Treating response time as a KPI in 2026 means something different than it did five years ago.
It means:
- Designing cloud environments not just for availability but for repeatable and verifiable recovery.
Ensuring your cybersecurity response enables restoration rather than slowing it.
Architecting networks for resilience through failovers, and traffic prioritisation, and using monitoring tools to provide visibility of when these actions must be taken.
Testing recovery across domains, not just within them.
Documenting those tests in ways that align with declared impact tolerances.
Metrics and KPIs are imperfect measures. But they are essential. You can only report what you have tested. And regulators and boards will expect evidence of that testing.
The real evolution is not abandoning RTO, MTTR or MTTD. It is embedding them into an operating model that reflects how you are actually judged: at the level of business service, financial impact and governance confidence.
What would happen if you were challenged tomorrow?
You likely believe your recovery capability is solid. And you may well be right.
But, if your board challenged whether your recovery objectives truly reflect your current complexity, could you defend that with evidence from recent, realistic testing?
If regulators reviewed your declared impact tolerances against your most recent failover exercise, would the numbers align comfortably?
Recovery speed has become a KPI not just of operations, but of trust, revenue and regulatory confidence.
Re-evaluating how you define and test it is not remedial. It is strategic.
It may be time to revisit how recovery speed is measured and governed across your cloud, security and network domains.
We work with enterprise leaders to align response metrics with impact tolerances, pressure-test cross-domain recovery under realistic conditions, and ensure performance stands up to board and regulatory scrutiny.
If you want confidence that your recovery speed reflects real resilience, not just historic performance, let’s have that conversation.
