Testing Once a Year Is Not a Resilience Strategy

Every organization that has ever failed a recovery – and there are more than anyone publicly acknowledges – had one thing in common: They believed they could recover before they tried.

The belief came from somewhere. A completed tabletop exercise. A backup system that showed green. An annual disaster recovery test that passed. All of it documented. All of it, at some point, accurate. None of it current when the incident actually hit.

This is the confidence gap. And it is the gap that continuous recovery validation is designed to close.

What ‘Testing’ Actually Means in Most Organizations

Ask most security or IT leaders how often they test their recovery capability, and the answer is typically annual, sometimes biannual. The test involves restoring a subset of systems from backup into a test environment, confirming they come up, and filing a report. Sometimes a tabletop exercise is conducted alongside it.

What this kind of testing does not do: validate that backup data is free of malware. Confirm that recovery sequencing works for interdependent services. Test identity recovery, which is essential when compromised credentials are what enabled the attack. Confirm that the team that would actually run the recovery knows the current runbooks. Or produce evidence meaningful enough to satisfy a regulator, an auditor, or a board that recovery capability is real and current.

In short, it validates a point in time. Resilience operations (ResOps) requires validation as a continuous state.

The Continuous Validation Model

Continuous recovery validation is not a single test run more frequently. It is a set of integrated practices that produce ongoing, evidence-based proof of recoverability across critical services.

Automated backup integrity scanning. Every backup, continuously evaluated for anomalies, encryption patterns, and malware signatures. Not at restore time – before restore time. The goal is to know whether your recovery points are clean before you need them, not during an incident.

Scheduled Cleanroom Recovery drills. Bi-annual at minimum, restoring from immutable backup points into an isolated Cleanroom Recovery environment – not production, not a production-adjacent test environment, but a genuinely isolated space where forensic analysis can happen without risk of reinfection. These drills produce documented evidence of recoverability against defined impact tolerances.

Identity recovery validation. With credential abuse the most common breach vector, Active Directory and Entra ID recovery must be tested alongside data recovery. Organizations that restore systems without restoring a verified-clean identity layer may find attackers re-enter through the same door.

Service Resilience Indicator (SRI) dashboards. SRIs – continuous signals drawn from backup telemetry, dependency mapping, and test results – that give CISOs, CIOs, and boards a live view of recoverability posture. Not a point-in-time report. An ongoing operational signal.

Each of these practices feeds what Deloitte and Commvault call the resilience backlog: a continuously updated, prioritized list of gaps identified through testing and tracked to resolution. It is the mechanism by which validation drives improvement rather than just producing reports.

What Mean Time to Clean Recovery Changes

Traditional recovery metrics – recovery time objective (RTO) and recovery point objective (RPO) – measure speed and data recency. They say nothing about whether the data being restored can be trusted. Mean Time to Clean Recovery (MTCR) fills that gap: It measures the time required to restore data that is verifiably clean, not just technically available.

MTCR matters because in a ransomware incident, the adversary’s goal is often to corrupt recovery options, not just encrypt production systems. An organization that restores quickly but restores from a compromised backup has not recovered. It has re-infected itself.

Building MTCR into your resilience measurement framework, alongside RTO and RPO, changes what you optimize for and what you report to the board. Speed plus recency plus integrity: that is the complete picture of recovery readiness.

Resilience You Can Prove

The organizations that navigate cyber disruptions with the least damage share one characteristic: They treat recovery capability as something to be continuously demonstrated, not periodically asserted. They know their MTCR. Their SRIs are current. Their cleanroom recovery has been tested in the last 90 days.

That posture is not the result of better technology alone. It is the result of an operating discipline – ResOps – that makes resilience continuous, measurable, and governable. Commvault’s platform provides the technical foundation: clean recovery, automated validation, and the unified visibility across data, identity, and services that ResOps requires at scale.

For the organizational side of that equation – how to define impact tolerances, align executive leadership, and build the governance structure that sustains the discipline – see the Deloitte companion blog, The Resilience Conversation Your Board Isn’t Having Yet. And for the complete ResOps framework, including the six ResOps domains and the measurement model that ties technical recoverability to board-level accountability, read the joint whitepaper: From Minimum Viability to Operational Resilience: ResOps in Practice.

Bill O’Connell is Chief Security Officer at Commvault.

Testing Once a Year Is Not a Resilience Strategy

What ‘Testing’ Actually Means in Most Organizations

The Continuous Validation Model

What Mean Time to Clean Recovery Changes

Resilience You Can Prove

More related posts

AI: Agents of Good, Meet Agents of Evil.

Resilience Over Panic: Four Steps for the Age of Frontier AI

Investing in Tomorrow’s Innovators: Commvault’s Partnership with TeenTech