Data deduplication includes an amazingly valuable set of technologies that has helped, in part, to radically change how we think about data protection. The first generation of deduplication technologies let users copy only changed blocks to a target device, allowing some incredible efficiencies in utilizing disk space. For example, a 10 TB data set could be reduced down (depending on file types and dedupe ratios) to 1 TB — or even less — on the backend following deduplication. There is a lot of goodness in cost savings there.
Then storage and backup providers started delivering source-side deduplication and allowed users to leverage the same sorts of benefits for optimizing the use of network and WAN traffic due to backups. For example, customers love the Commvault embedded source-side dedupe option in Simpana software because they can often defer or eliminate costly network upgrades that they may be faced with — simply to run backups.
But what can you do with that deduplicated data really? Once you've reduced the data down to a fully deduped copy, what then? What if you need to archive that data? What about disaster recovery? Or maybe tier that copy off to a different tier of disk or tape? How does that work?
It turns out, with most solutions out there, once you dedupe data that's it, you're stuck. In order to move that data to a secondary, archive or disaster recovery tier — that is if you actually want to do something with that data — you have to rehydrate the entire dataset first then run a copy. Let's be super clear about what this entails. Rehydration is the process of taking a fully deduplicated copy and essentially reduplicating it. So at the end of the day, after saving money and time generating the deduped copy, when it's time to use it, sit back and get out your credit card, because you need a whole lot of disk and whole lot of time to rehydrate all that data so you can use it. This article by Arun Taneja from the Taneja Group puts it well: "The movement of data from one place in the organization to another should not require the data to be rehydrated. This means it must be replicated and stored on the remote site in the same shrunken format that it started out as."
We couldn't agree more. Rehydration requires a lot of extra cost and a lot of time — with no additional benefit to you!
The engineering team at Commvault has developed a truly innovative way to solve this problem; we call it DASH Copy. It is designed specifically to unlock all the deduplicated data and immediately put it to work — quickly, efficiently and cost effectively.
What is DASH Copy?
DASH copy is a method for making deduplication-aware secondary copies. DASH technology generates secondary copies of data while maintaining the deduped format. This method eliminates the rehydration process by moving only changed blocks across the network to the secondary target. This approach is very fast and very efficient in terms of processing power, network utilization and storage resource consumption.
So DASH Copy is really a big step forward in integrating data deduplication into your overall data management strategy. By completely eliminating the rehydration step, you save in several key ways:
- Time to copy data – secondary copies of deduplicated data can be run in seconds or minutes instead of the many hours it often takes to rehydrate and copy out a deduplicated data set
- Cost of extra disk – with traditional solutions, you need a lot of extra disk somewhere for the rehydrated data set to land prior to the secondary copy operation
- Flexibility – with greater efficiency you have more choice in how you store, move and repurpose deduped data
DASH Copy Use Cases
There are a number of different use cases where DASH Copy can be valuable — disaster recovery and remote office protection among the top. In the figure below, I outline at a high level how DASH Copy can be used to implement an extremely efficient and cost-effective tiered storage strategy (including a remote office component) to optimize cost, recovery and retention requirements.
- Remote office backup (on-premise dedupe copy) - We start with a deduplicated backup at the remote office (Copy 1) and retain for 5 days. This could be stored on an onsite backup appliance like the Dell PowerVault DL2300 powered by Commvault.
- Centralized Retention Copy (DASH Copy Job #1) - Next, we can regularly schedule a DASH Copy of the dataset for the Central Office for consolidated operations (Copy 2). With DASH Copy, this operation can run in just few minutes, and over a network with limited bandwidth, we can leverage the embedded source-side deduplication option to ensure WAN optimization. And we'll keep this copy for 30-day retention needs at the central office.
- Disaster Recovery or Archive Copy (DASH Copy Job #2) - We can then replicate DASH copies from site to site for disaster recovery or long-term archive. In this case, we schedule a DASH Copy job to run off a copy of the centralized backup (Copy 2) to an offsite storage tier (Copy 3) — this could even be a cloud target through a full set of HTTPS/REST interfaces. We can set a policy to retain this copy for a year or even longer — depending on the requirement.
FIGURE 1: DASH Copy Example
Admittedly, this is a pretty high-level summary, but I felt it important to introduce this concept for you here. This is a topic that's definitely top of mind for us and for our storage-focused partners, a fact reinforced by the conversations around accelerating and automating DR strategies I had at the recent VMware Partner Exchange conference. In a few upcoming posts I will address some of these use cases in a bit more depth to give you a feel for how the technology is working in more detail and how we can leverage DASH Copy in combination with other advanced features in Simpana 10 to deliver truly modern data management.
If you want to read and learn more about DASH Copy and other innovative approaches available in Commvault Simpana 10 software, please be sure to check here for detailed documentation on the technology and the process.