BREAKOUT SESSION
Introducing MTCR: The Missing Metric for Cyber Recovery Success
Join Commvault’s Chief Trust Officer, Danielle Sheer, alongside Darren Thompson, Field CTO at Commvault, and Duncan Bradley, Security and Resilience Practice Leader at Kyndryl, for a groundbreaking discussion on Meantime to Clean Recovery (MTCR).
About This Session
Explore why Mean Time to Clean Recovery (MTCR) is emerging as the new gold standard for cyber resilience, shifting the conversation from how fast systems can be restored to how clean, trustworthy, and uncompromised those systems are upon recovery.
Examine why traditional metrics like RTO and RPO no longer reflect modern cyber risk, as organizations increasingly discover that rapid restoration of infected or unverified data can reignite attacks, extend downtime, and jeopardize business continuity.
Analyze the real-world complexity of clean recovery, illustrated by a major retail ransomware incident in 2025 where containment happened quickly but full recovery took nearly three months—primarily due to the challenge of identifying clean, validated data rather than restoring systems themselves.
Understand how MTCR reframes recovery as both a technical and business imperative, giving boards and executives a clearer metric for evaluating risk exposure, time-to-value, and overall cyber readiness across the organization.
Learn the foundational pillars that support MTCR, including clean data verification, isolated recovery processes, rigorous validation pipelines, and operational practices that ensure recovered systems do not reintroduce threats into production.
Key Takeaways
- MTCR sets a new benchmark for cyber resilience, focusing on recovery quality instead of recovery speed.
- RTO and RPO alone are outdated, as they fail to measure whether restored systems are clean, safe, and attack-free.
- Recovering compromised or unverified data can restart an attack, making clean data validation essential.
- Real-world ransomware events show recovery timelines extending to months, primarily due to the difficulty of identifying trustworthy datasets.
- MTCR reframes cyber recovery as a strategic business issue, requiring executive oversight and board-level visibility.
- Organizations adopting MTCR gain a more accurate understanding of cyber risk, enabling smarter investments and more resilient recovery planning.
Cleanroom Recovery
Cleanroom Recovery combines unique capabilities to identify and ensure a clean recovery, plus the ability to guarantee safe recovery to a cleanroom in the cloud.
British Medical Association
Learn about how a Commvault customer restored clean data after a ransomware attack.
Reduce Recovery Time From a Ransomware Attack
Having the right tools to recover quickly from a security breach makes all the difference.
Frequently Asked Questions
What is Mean Time to Clean Recovery (MTCR)?
MTCR measures how long it takes to restore systems and data cleanly—free from malware, corruption, persistence mechanisms, or compromise—after a cyberattack. It prioritizes recovery integrity, not just speed.
How is MTCR different from RTO and RPO?
RTO and RPO focus on time and data loss tolerance. MTCR focuses on trustworthiness. Fast recovery means little if the restored environment still contains threats capable of reactivating the attack.
Why does clean recovery take longer than traditional restoration?
Clean recovery requires verifying the integrity of backups, inspecting systems for latent threats, and validating that restored workloads are free from adversary persistence. This process is significantly more complex than simple failover or rehydration.
Why are recovery times stretching into months for many organizations?
Threat actors increasingly deploy techniques that corrupt backups, hide in identity systems, or create hard-to-detect persistence. As a result, most recovery time is spent locating clean data—not rebuilding infrastructure.
Transcript
View Transcript
Please view video here for a time-stamped transcript
Welcome to our discussion on a new way to think about cyber resilience, Mean Time to Clean
Recovery or MTCR.
I’m joined by two experts who created this MTCR concept, Darren Thomson, field CTO at
Commvault, who works closely with organizations across industries on resilience and
strategies, and Duncan Bradley, UKI security and resiliency practice leader at Kyndryl
They authored the original thought leadership of MTCR And
for years, organizations have been measured recovery success by how quickly systems can be
restored.
We use acronyms like RTO and RPO, RTO, recovery time objective, RPO, recovery point
objective.
But in today’s threat landscape, we know that speed alone does not equal safety and it
doesn’t equal recovery success.
If the data you recover isn’t clean and trustworthy, you risk turning your recovery into
the continuation of another attack.
Hi.
I’m Danielle Sheer, Chief Trust Officer at Commvault.
And I spend a lot of my time helping boards and executives understand cyber risk meets
business risk.
I’m very excited to guide today’s discussion and to unpack why this new concept, MTCR,
isn’t just another technical metric, but the business standard that now belongs in the
boardroom.
Let’s have a shot at this.
Yeah, thanks Danielle.
And thanks for the great intro.
Just this year in 2025, in the UK in particular, but actually all over the world,
particularly Europe, we’ve seen some focus attacks on retail.
And we’ve seen firsthand just how hard it has been for those retailers to stay engaged
with their customers and importantly recover their systems.
This year I was personally involved in a particular attack started around the middle of
April of 2025.
And just to give you a bit of a sense as to the timeline that took place for that business
to recover.
In the April period, there was essentially a ransomware event.
And during that period, the business did a great job of kind of, if not stopping that
event, isolating and stopping the spread.
In the May timeframe, they moved into trying to confirm the nature and the scope, what I
call the blast radius of the attack.
And they started to communicate out.
And then from May through to the beginning of August was the actual recovery of systems.
So in other words, know, pushing three months and through the August timeframe, then only
a phased attack, sort of a phased approach to reintroducing systems.
So why all of that time, you know, three months as good as to bring that business back
online?
Well, much of that time was spent trying to find the right data, the clean data.
The clean systems with which that business could rely and get back to engaging with their
customers and generating their revenues as a retailer.
And that right there, that kind of three month period is indicative of what we want to
talk about here.
The business has got to start asking a different set of questions of the IT organization
in order to get to a place where not only can promises be made in terms of how quickly can
we restore systems,
but promises can start to be made about how clean those systems are going to be, how clean
that nature is going to be.
Otherwise, I’m afraid we are going to continue to see, know, typically two, three,
sometimes even four or five months to bring systems back online.
You know, OK, that’s fascinating.
You gave us a lot to work with.
Duncan, let me now go back to the beginning.
Can you give us the definition of MTCR?
Talk us through the five pillars and just why that is the right way for us in this
landscape, this threat landscape to measure success in a recovery.
Yeah, sure Danielle.
So I think it’s really important.
We came up with the concept of Mean Time to Clean Recovery to really be a business metric
for something the business could understand.
This is about how long it takes them to get back online as Darren was saying.
With clean data, with systems that can operate and be back into production.
Most organizations have really focused on availability of their systems being available
online without understanding that these cyber attacks, take out the data and the platforms
that support those operations.
So when you come to a cyber recovery event, it’s more a case of you’ve got to reestablish
from day one to recovery point,
clean environment.
So that often takes days, as Darren was saying in the retail example.
You then need to rebuild all your foundational services.
You then need to bring your platforms back online and then you need to recover clean data
into it.
And quite often with the types of attacks we’re seeing is that data is contaminated for
days, if not weeks prior to the attack.
So you need to look into your backups and bring back that clean data back into production.
This MTCR, Mean Time to Clean Recovery is meant to be a business indicator to say,
How long can you be out for?
How long would you be out for today if the event happened?
And then this is used to challenge the IT and the CC organizations to shrink that meantime
to recovery.
And so how is MTCR different from traditional measures?
It sounds like it’s sort of a lot of the traditional measures in one larger measure.
Well, I think this is where you recovery time objectives.
How long does it take to get that system back online?
It doesn’t take into account what type of data is going to be within that platform.
You know, and typically they use as availability.
How long does it take me to fail over from data center A to data center B and that
traditional um infrastructure attack?
Recovering data back is a much more complicated um endeavor.
It’s not just about getting a data set back.
It’s about getting the platform back online.
So if you think about identity services, you need your identity working before anyone can
access the data.
Just measuring how quickly it takes to get a database back
doesn’t mean that that business process will be back online.
And it’s really designed to encapsulate those different pillars of those systems that you
need to bring online to be able to get the business process working.
And Danielle, if I may, I think this talks to a larger problem.
So it’s very easy to uh go down the technical route on this conversation.
And indeed, this represents in many ways a way of measuring the technical organization.
But think this speaks to an organisational problem.
What we see all of the time is that one of the issues culturally around recovering
systems, and one of the reasons it takes so long in many cases, as was the case in my
example, is we’re actually dealing, there are two teams that need to take some
responsibility here.
There is clearly the traditional kind of infrastructure team that the people responsible
for backup and recovery, clearly they have a job to do, but it isn’t only them anymore.
It’s a security team as well.
A team that can tell you via forensics, for example, where the clean backup is, where the
clean data is, how infected systems are, that’s a security team.
So where those teams are not working together on recovery, or they’re speaking different
languages, or they’re measuring themselves in different ways, we start to see a problem
and it starts with culture.
So one way of thinking about this is if you take RPO and RPO, recovery time objective,
How long is it going to take me to recover?
Recovery point objective, how much data am I likely to use?
And you combine that with the business of forensics, a security discipline.
Now we’ve got MTCR.
So this is, the C is important.
The cleanliness of that data is really, really critical.
And the starting point here is no doubt we’ll discuss is let’s just have the business ask
that question.
How long can we get our systems back clean?
And that’s what we see every day is that that’s not that tricky a question to ask, but a
really tricky question to answer, particularly where there is that cultural gap between
the security and the infrastructure team.
I really think that’s what the brilliance of MTCR is because you take a number of very
technical concepts that IT and security teams measure themselves on and you turn it into
something that you can talk about with CEOs and boards of directors.
Speaking your example, Darren, of a retailer at Christmas who has experienced a cyber
attack, how would you use MTCR if you were standing up in front of a board of directors to
explain what’s happened and how the uh recovery process is going?
Well, I’ll start the response and I’m going to hand off to Duncan to end the response
because there’s a bit of this paper that Duncan was very responsible for to do with really
the grouping
of assets that combine into a service.
But I’ll start by saying there is a couple of discussions that need to be had that very
often are not being had.
One is around risk appetite.
So who are we as a business?
What kind of a business are we?
A retailer, typically you’ll see as sort of a low to medium appetite for risk, somewhere
around there.
But that needs to be defined so that moving down the line as we start to uh understand the
risks that surround us, we can decide what’s acceptable and what’s
should be shared by our insurance, for example, and what should be absolutely being
integrated.
So that’s conversation one.
Conversation two is then, what’s the minimal viable company here?
It is completely unrealistic to assume that any organization with any amount of budget is
going to be able to recover everything in the business quickly.
So what really matters in this business?
That’s a very, very deep and interesting conversation to have with most businesses.
Most businesses cannot imagine a world with IT.
So they find it very hard to unravel the business processes and have a conversation about,
if we were back to paper, what would this look like?
But that conversation kind of needs to happen.
And where it leads us is to get to what we call an MVC, a minimal viable company.
That is, what stuff needs to be back in the day, or whatever the timeline is that we
decide.
What is the stuff that without it, this business doesn’t run at all, we’re not operating.
And that’s a very, very important discussion.
What you can then start to do is wrap the MTCR concept around the minimal viable company.
And I’ll hand off to Duncan now who can sort of define for us what makes up a service.
How do we think about this in context of services?
And thanks, Darren.
And again, it is without trying to get too complicated and too deep.
MTCR is like measuring mean times clean recovery of a business process,
a critical business service that then flows up to make the overall minimal viable company.
So there’ll be multiple different MTCRs for different business processes.
I was just literally today talking to a consumer property goods company, a brewer, and
they were talking about, if we lose four days worth of data, we have to literally pour
four days worth of beer down the drain because we can’t prove it’s safe to drink,
therefore we can’t sell it.
So within this concept is about dissecting those business processes and then using that
MTCR as a common business language to then be able to say, right, within my process, need
to then, oh this process reliant on numbers of different types of applications.
They all rely on foundational services, things like identity, the ability for the
customers to log into the platform.
You then have your backend systems, whether that be in cloud or on-prem, your hypervisors,
your critical cloud solutions.
Your front end systems that run the actual applications and then your user access systems
where actually how people get access to that solution is it via PDAs, points of sale
devices, et cetera?
All of those need to be available for that business service to run.
It’s the MTCR is designed for the business owner to own it.
First step in that journey is to then benchmark.
What is their current MTCR?
How long would it currently take them even if they could get back to clean data?
Because that’s the shocking point at the moment is most organizations do not have
technology in place that allows them to protect their backups and be able to interrogate
their backups to take out clean data, which is the real business problem that we’re here
to solve.
It’s very interesting.
You said, what is the current MTCR?
So MTCR isn’t a measurement that’s taken at the end to measure how well we recovered.
MTCR is taken at intervals throughout the entire attack.
And that’s probably because
if you rush to a recovery, you can actually do more damage.
Can you talk a little bit about that?
Yeah, I can certainly pick that up.
Before I do that, though, you made it, you made it to a really interesting point here,
which I think needs to be extended further.
So in MTCR is not just something you would measure throughout the recovery from an attack.
It’s something you should be measuring pre breach.
So what I call kind of left of bang.
When we introduce this concept into boards of directors, we start to see this permeate
through an organization.
And the question is asked, what is the MTCR, for example, for our minimal viable company?
We’re to get some answers we don’t want to hear, like we don’t know what a minimal viable
company is.
Or even if we do, realistically, our MTCR right now is probably four months.
We don’t want to hear those answers, but we need to hear them in order that we can focus
on improving.
And so that’s a really, really sort of important point, I think.
And Darren, to build on that, I think that it’s vital for the business to understand what
their MTCR is today, because this then informs their business risk process properly.
And it’s not always going to be a technology solution that’s going to reduce that MTCR.
Quite often, if the business know that they’ve got this risk that they might be without a
critical system for
a certain period of time, they can build in an alternative business continuity plan,
return to paper, you know, might be the worst case, Darren, but, you know, right now,
business do not understand the risks that they’re covering.
And this MTCR is meant to be this ability for the board to go into the CISO and CISO teams
and say, how long realistically will be down today?
And then if that’s an acceptable number, and we’ve done, I’ve done a number of these sort
of studies and it’s…
28, 35, 42 days are often.
The business is then totally horrified that they would be without IT services for that
amount of time.
But they’ve never put those demands on the CIO or CISO organizations to say, we need the
minimum viable company, these assets back online within 24 hours.
How can you design those business processes to be recoverable within 24 hours?
It needs to be a functional or non-functional requirements of their business to say, these
are our requirements.
IT, CISO, can you help me meet them?
And just going back to the point I made earlier, by the way, um it’s really important to
understand that in some ways that question cannot even be asked of one or either of those
teams.
The only time you’re to get any kind of response that makes any kind of sense, even if
it’s not a response that you want, is if you have the infrastructure team and the security
team in the room to answer it.
Because the question just can’t be answered by either or of those teams.
And that’s, again, culturally, this is so, so important before we get anywhere near
technology.
Culturally, is, know, what we’re trying to do here with this concept is drive some
behavior that doesn’t exist today.
That’s right.
You’re trying to break down silos.
So each organization, whether it’s IT or the infrastructure team or SecOps is sort of my
part’s fine, but all those parts have to work together.
And so MTCR brings them together and makes them think about this
concept of risk in the environment as every single person has a part to play in that.
And then I think it’s really transformative in that it gives every single one of those
leaders an ability to translate to a CEO or a board of directors and speak about what
they’re actually experiencing in terms of recovery in business risk, right?
Absolutely, that’s it’s awesome.
So how does it connect to simple requirements?
Not so simple, but compliance requirements like DORA and NIST 2.
How does MTCR speak to that, if at all?
Well, I’ll start if that’s okay, Duncan.
So here’s my initial thinking on that, you know, Danielle, you and I have both been
through those DORA articles plenty of times together and spoken about compliance.
One of the things that we’re starting to see in the compliance landscape, whether it be
DORA or NIST 2 or other global regulations, is this mention of, number one, what
resilience and what resilience means as opposed to security, and number two, testing.
Do you have a plan?
Have you tested the plan for recovery?
Well, I think this speaks directly to what should be the measurement of the plan.
I’m slightly concerned that many of the regulations I see, even the evolving and newer
regulations, are still making reference to things like disaster recovery.
They’re still not being very explicit about what should be tested.
I’ve even seen some mentions of RPO-RTO.
And so to a certain extent, know, this is, a concept like this is important as people
start to become compliant and move into that kind of compliance that’s very resilience
based.
in terms of what are we going to measure here?
Because if we just measure old disaster recovery metrics, we’re really not going to move
very far forward here.
What we’re going to find is we get attacked, we go to our DR plan, we execute the DR plan,
and surprise, surprise, we can’t find our clean backup, and it takes us 30, 40, 50, 60
days to recover.
So we’ve got to change the mentality here.
And with regards to compliance, it was so refreshing to see some of the new compliance
policies and schedules
mentioned resilience, recovery and testing, but we’ve got to make sure that we think about
those in the right way and with the right kind of frameworks and contexts.
Yeah, I totally agree, Darren.
Think, you if you look at a lot of the requirements from DORA and what the European
Central Bank have said about stress tests, most organizations have set themselves up to
fail because it says you’ve got to prove that you can recover within your business impact
tolerance.
And most organizations are set in that business impact tolerance
on availability metrics or the old DR plan that says I can fail over from data center A to
data center B in four hours.
You can’t recover an entire bank or an entire financial institution in four hours, it’s
impossible.
So I think, and this is sort of highlights the need for this new vocabulary of MTCR so
that they can relook at that business tolerance, that business impact tolerance to
redefine it to say, if I lose my core system, for example, my ledger,
you know, how do I operate?
Well, I can’t operate.
So therefore I need to put in contingency plans of how I can sub-process to then be able
to roll back once you get your ledger back online in the financial example.
But, you you said about testing, most organizations, when I asked that question to, you
know, when did you last test CR, they point back to their traditional DR plan.
They’re not pointing to their cyber recovery plan because most organizations haven’t yet
fully designed and implemented a cyber recovery plan.
And absolutely this MTCR needs to be one of those key metrics within that cyber recovery
plan.
You’ve designed your scenarios you’re gonna protect yourself and recover against.
And then you have an MTCR measurement for those different types of attack.
It’s really here to try and as I say bridge that gap between business and IT and CISO.
You know, Darren, you were saying it’s about changing the culture and Duncan, you were
just saying so many…
contracts between uh organizations do require a recovery time, and that’s meaningless.
So it’s not just for uh the technical teams, but also the enterprise risk management
teams, the lawyers who are contracting to understand that RTO and RPO are not enough, but
the new standard is clean recovery.
And everybody needs to understand what the end result should be, not sort of the pieces
along the way that
might be meaningless, right?
Yeah, I think that’s a really good point.
And I fully expect as we move forward with the concept and we run workshops around
preferred risk posture and around in all viable company and around the MTCR, I expect us
to discover that, you know, perhaps many more people than traditionally would have been
associated with system recovery need to take an interest and have a voice in those
workshops.
For precisely the reasons you mentioned, Danielle.
So for leaders listening,
I encourage you all to check out the full report.
It’s called Redefining Cyber Recovery, Introducing Mean Time to Clean Recovery.
And I have one more question for Duncan and Darren, which is, you’ve covered a lot of
ground today.
You’ve grounded it in a real application for a real business, which makes a lot of sense.
For leaders listening, where do we start?
Where do we start right now?
We read the report and we want to introduce this concept into our companies, into our
businesses.
How do we get started?
Well, for me, for a non-technical leader, I would encourage the calling of a meeting
between whoever’s responsible for infrastructure delivery in the company, normally a CIO,
and whoever’s responsible for cybersecurity in the company, normally a CISO, may not be,
call a meeting with those two individuals, those two leaders, and ask the question, how
quickly, if we were to suffer a catastrophic ransomware attack, let’s use ransomware
as an example, because it’s prolific and we’re all being affected by ransomware.
So in the event of catastrophe with ransomware, how quickly could you restore this
business together and guarantee me that the data I’m about to start using to run the
business is clean?
I guarantee you five other meetings will be spawned from that one meeting because the
answer is not easy.
It will require collaboration across those teams.
It will require a mindset change as well.
And I think, I would say a very similar thing apart from I’d probably widen that audience
to bring in people like their Chief Risk Officer and procurement into that conversation.
Because most organizations, when they look into whether it’s a SaaS contract or whether
it’s an outsourcing contract, it’s generally silent on this cyber recovery.
When can I get recovery back to clean data?
Because in most of those types of contract, it is the customer that’s responsible for
their data.
Those infrastructure suppliers and those SaaS suppliers are responsible for the
infrastructure that supports that service.
So, you know, it’s them, they’ve got to own that responsibility for getting their clean
data back.
But also they then need to disseminate that quite quickly to their business leaders to try
and really understand what is our MVC because organizations, as you said earlier on
Darren, just cannot afford and will never be able to recover their entire enterprise’s
systems within a very short period of time.
The key here is about prioritizing and only the business can judge what is most important
to them in a disaster cyber boom situation.
Darren, CTO Commvault, Duncan, uh Security and Resilience Practice Leader at Kyndryl.
Thank you so much.
Next steps, I hope you guys take this MTCR concept on the road and help business leaders
workshop it.
And I fully expect we will start seeing this show up uh all over globally uh as we help
our companies figure out the MVC for each of them.
Thank you so much for your time today.
Thanks, Danielle.