In part one of this two-part blog, we talked about the proliferation of snapshot solutions out on the market that offers customers a wealth of choices. But what customers really want, and demand even, is broad software integration for heterogeneous hardware and the wide range of applications they use. Having more point products to manage just doesn’t cut it anymore.
In this blog, the conversation on snapshots continues:
How, where and when you run snapshots matters
As we said previously, Commvault has been doing snapshot management for many years. And quite frankly, we are at the point where both customers and many additional storage vendors are seeking out Commvault at an engineering level for our advice on how to implement snapshot strategies properly. In fact, we did a capacity sellout session on this at VMworld 2013, which received a lot of great feedback. Actually, we’ve presented on snapshot management 3 years in a row at VMworld, and you can expect to see much more in this area as we continue to extend the value of Simpana software. Check out the sessions we ran on this topic at VMworld 2013.
The most important thing we’ve learned is that every storage platform runs snapshot operations differently. For arrays that are optimized around snapshots (NetApp is good here), then driving a snap copy every 15 minutes for mission critical apps is no problem. However, if you are running storage that is optimized for a different metric – say very high performance I/O – then you need to be really careful in designing and sizing your RPO based on what the array can handle. Honestly, we’ve seen some arrays that can easily handle the 256 snapshots per volume that the marketing people claim. Others…yeah, not so much. There are a few arrays that realistically can handle maybe 8 to 10 snaps. Players in the snapshot management market know better which is which because customers expect that depth of knowledge.
But let’s take an example on NetApp storage. Many customers implement NetApp storage as NFS filers, particularly behind VMware. Lots of reasons for this, as the VMware datastore concept fits really well with NetApp NFS structures. Also, with this approach, the customer can take full advantage of the primary deduplication technology built into ONTAP 8. And with the resulting storage efficiency, larger customers are deploying many hundreds of VMs on a single FAS. But you now have to think hard through the implications this has on your snapshot strategy. If your snapshot approach is not properly designed, you are introducing a host of issues into the customer environment. Over the years, specific issues we have seen with inappropriate snapshot implementations include:
- Unnecessary snapshots: With customers running so many VMs on every FAS, how are those volumes deployed? The reality is most application and VM admins are not yet paying attention to the design of the storage underneath (kudos to the many that are). So many, many times, you run into a single or just a few volumes created with VMs spread all over the place, with little thought to the protection strategy. Some of these VMs are mission critical, some are not. And every VM has a different RPO requirement. With the wrong implementation, you are going to force the customer to take multiple snapshots of the same volume again and again to meet varying RPOs of those varied VM workloads. So design of the underlying storage structure becomes incredibly important. And ensuring you have implemented a storage strategy that lines up the stack to the VMs, workloads and what the data protection requirements are for each type is a critical success factor.
- Unnecessary replication pairs: NetApp SnapVault occurs at the volume level too, which means if you are creating multiple snapshots for the volume because of the varied VMs in that volume, the same data is replicated for multiple pairs. So again, up-front design from top to bottom, starting with the VMs and workloads and the RPO/RTO requirements they have, and then following that through implementation of the storage and data protection solution is a critical success factor. In this case, this will have a big impact on network infrastructure too.
Requirement to deliver efficient protection for extremely large NAS volumes: We are working with customers who have Petabyte-scale NAS implementations. We have seen that a traditional streamed backup approach just isn’t practicable. These are the teams who believe NDMP is a four-letter word – and there are tons of these folks out there. So snapshot support and end-to-end management of very large-scale NAS filers is absolutely a game changer. Leveraging an IntelliSnap type approach, we have seen customers reduce a protection window for a 100 TB NAS filer from over 3 days to just a few minutes leveraging NDMP. IntelliSnap will drive the snapshot functionality of the filer and then automatically kick off an off-host backup of that snapshot that is cataloged for file level recovery.
The thing is, storage snapshots are definitely NOT where you want to just have a marketing check box. If you don’t do it right, you are going to be in big trouble.
Solution must support application-aware snapshots
We all agree that the major value of storage snapshots is to deliver mission critical RPOs. So let’s talk about what requires mission critical RPOs: Big heavy-lifting, high I/O applications like Oracle, Exchange, SAP, and in lots of cases, VMware and Hyper-V. Customers need to be wary of vaporware and ask vendors about specific application-aware recovery capabilities. And if the response is, "We have VSS and VMware Tools,” then there is a problem. Every application is different, log management and truncation is a fundamental requirement in this space. On the log side, we have seen customers value Commvault’s ability to manage application-aware and consistent snapshots across the entire spectrum of mission critical apps. They can use the combined power of snapshots and traditional backups to ensure that applications and the application logs are protected on different schedules, leveraging both approaches – since they do have different requirements. That’s true modern data protection along the lifecycle of data.
Flexibility for snapshot backups lowers cost and complexity
Backing up snapshots is a beautiful thing. Customers love it. It extends the life of storage snapshots and allows customers to get more value out of their primary storage, while also meeting long-term retention requirements without having to add a second point solution. But where are you sending this backup to? Are you limited to a like-for-like vendor backup? OK, that’s fine. But most customers value the IntelliSnap implementation for the flexibility to drive multiple options depending on the retention and recovery requirement, including:
- SnapVault Operations natively
- SnapMirror Operations natively
- Backup NetApp snapshots to heterogeneous disk (even SATA JBOD)
CommVault IntelliSnap integration with NetApp Storage for an end-to-end data protection strategy
Granular, Automated Recovery is Table Stakes
So snapshots are about recovery in a lot of ways. And many vendors talk about the speed of recovery. But recovery of what? One of the cool things Commvault customers value in IntelliSnap is the ability to simplify what is typically an extremely complex recovery workflow and automate everything. Once a snapshot is captured under Commvault IntelliSnap, we have the ability to add both the snapshot and all of its contents to our catalog. This is huge. It means we can present a browseable view of the file level contents of the snapshots back to the customer, with point and click recovery. When Commvault delivered this capability in 2010 with Simpana 9, we helped customers say goodbye to the days of mounting snapshots and browsing one by one through the contents of each snap, greatly reducing search time.
Efficient offsite replication makes Disaster Recovery affordable and automated
Earlier we mentioned SnapMirror. In a NetApp context, Commvault Simpana software has the ability to drive SnapMirror relationships for fast offsite copy in a DR context. But we’re seeing demand for similar approaches outside of a SnapMirror context. For those customers, we can implement an automated process for generating a snap or backup copy on the primary data center, driving a deduped backup of the snap and then running a dedupe-aware offsite copy (we call it DASH Copy) from the backup to your DR site. Once data is copied across to the DR site, we don’t stop. Using “VirtualizeMe,” which Chris Mellor from The Register nicely analyzed recently, we will automatically provision new VMs either in VMware or Hyper-V and then populate those newly created VMs with data. So you have a true DR scenario that customers are using to meet DR audit/test requirements as well as to run their DR operations. Today, we’re supporting both Microsoft and VMware environments.
So in closing, ask yourself just what do these party crashers do anyway? And as the customer, do you want version 1.0 or version 3.0?