A Wish List For Multi-Cloud Data Protection

By Ned Bellavance

Ned Bellavance is the director of cloud solutions for Anexinet and offers his views about data protection in multi-cloud environments. Check out the “Managing Multi-Cloud Data Protection” webinar he and Commvault’s Matt Tyrer hosted recently.

Introduction

For most companies, the data in their organization is their most valuable asset. This makes data backup and recovery essential to IT Operations. Data needs to be protected against accidental or intentional damage, deletion and modification.

As companies fan out into the public cloud, consuming services like Software as a Service (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service (IaaS), they also need an effective way to protect the data housed in those locations. While they may already have a solution running on-premises, it is not a foregone conclusion that their current software is capable of protecting cloud services. As a result, many companies are re-evaluating their data protection solution in the context of a multi-cloud reality.

Defining multi-cloud

Of course, if we’re going to throw out a term like multi-cloud, it would be helpful to define it.  Multi-cloud is simply the consumption of multiple cloud services by a single organization. That includes all types of services – SaaS, PaaS, and Iaas.

According to the “RightScale 2019 State of the Cloud Report,” 84 percent of enterprises have a multi-cloud strategy. Whether or not an organization is explicitly pursuing a multi-cloud strategy, the reality is that they are likely in multiple clouds. In any company larger than a small start-up, there is going to be some portion of the company consuming a cloud service that IT is not aware of. It could be the human resources department employing a SaaS service for employee training, or a cadre of developers using AWS for a new application idea, or even the CMO utilizing Box to share files with marketing contractors.

New challenges for data protection

All of this data being stored in all these disparate clouds presents new challenges for IT operations, not the least of which is effectively protecting the data. There are three primary challenges to deal with when it comes to multi-cloud data protection:

  1. Discovery of data: If the IT operations group is unaware of a service being used, it is going to be pretty difficult to protect it.
  2. Consistency of operations: More cloud services might mean more, distinct data protection solutions. IT operations need to maintain consistent policies for retention and replication across all of the solutions being used.
  3. Cost management: Cloud services are generally a pay-as-you-go affair, including ongoing charges for data storage and egress of data out of the service. IT operations need to effectively protect their data without racking up egregious charges.

My wish list

Ideally, a data protection solution should assist with some, or all, of these challenges. If I were to design my perfect solution, I would include the following items on my wish list:

Consistent management platform

The “single pane of glass” phrase is one of the most hackneyed sayings by vendors. It’s a totally unrealistic expectation that a single solution can provide a unified view that encompasses the entirety of an organization. When it comes to data protection, however, a consistent management platform is a must have. Most public clouds have a native data protection solution.  Azure has the creatively-named Azure Backup, and AWS has the equally-innovative AWS Backup. 

Assuming that an organization is consuming both public cloud services and has an on-premises environment, then they are now using at least three distinct data protection solutions. That’s less than ideal.  Maintaining some semblance of consistency for policies, monitoring of jobs and discovery of new assets becomes an operational nightmare.

If I were to design a new data protection solution, I would ensure that it had a consistent management experience across all public cloud services and on-premises environments.  Anything less would be uncivilized. And a huge pain.

Cloud native deployment

There are a bunch of data protection vendors out there who claim cloud native deployment of their solution, but what they really mean is that their software can run on a virtual machine in one of the public clouds. That’s not particularly impressive, considering their solution probably also runs on virtual machines on-premises. The primary strength of public cloud is its elasticity, and separation of compute, storage and networking into discrete-, independently-scaling services. 

An ideal data protection solution should take advantage of these services in a cloud native way.  Rather than adding more storage to virtual machines, more storage can be provisioned on the fly with blob-based storage. When more compute is required to process the nightly backup workloads, then the solution can scale horizontally to handle the increased load. Even more important, the solution can scale back down when the increased compute demand ends every morning. Most public cloud storage solutions also incorporate lifecycle policies and storage tiering. For instance, Azure has Azure Archive. These native constructs for storage tiering should also be used to reduce costs.

Efficient data movement

Data protection in the cloud exacerbates the data gravity issue, by which I mean that moving large amounts of data is difficult and expensive. Storing backups of data in the same public cloud and region as the primary data might be more efficient, but it might not be advisable. The public cloud as a whole is very reliable, and the data is extremely durable – 99.999999999 percent in the case of S3.  Still, outages do occur. The data may still exist, uncorrupted, but if it cannot be reached, then it is not of much use. Systemic outages across regions have also occurred from time to time. If access to my backup data is considered business critical, then I would be remiss storing all of my backups in a single location or public cloud.

Data movement between public clouds, or even between regions in the same public cloud costs, money. All public clouds have the concept of a network egress charge, where users are charged per GB of data leaving the public cloud. The more data you move between clouds, the more expensive that data movement will be. 

An ideal solution would use compression and data deduplication to reduce the required bandwidth consumption for data movement. WAN accelerator companies, like Riverbed and Silverpeak, have already mastered this challenge for branch offices – where the challenge was a lack of available bandwidth – rather than the excessive charges for consuming bandwidth.  

Data protection software is in a unique position to implement similar data efficiencies due to its intimate knowledge of the data being protected. Many data protection solutions already implement some form of an infinite incremental to back up only data that has changed since the last backup. These solutions also tend to use data deduplication to reduce data consumption on the storage back end. A data protection solution that uses source side dedupe with a global dedupe dictionary could save a company thousands in data egress charges.

Conclusion

Organizations are finding themselves in a new multi-cloud reality and that creates several challenges around IT operations, data governance and compliance. Additional clouds and services bring increased complexity and cost concerns. 

As much of a headache as data protection can be in a traditional on-premises environment, that headache becomes a migraine in a multi-cloud world. Data protection vendors must develop an analgesic that mitigates the compounded complexity and eases the IT practitioner’s suffering. 

Such a solution needs to have a consistent management platform across all instances of the software, a cloud native deployment and operating model, and be capable of moving data efficiently around the multi-cloud landscape.