Multiple RPOs and RTOs Help to Streamline Your Disaster Recovery Workflow
When it comes to disaster recovery, we all look to minimize downtime and potential data loss. Wouldn’t it be great to drive your recovery point objectives (RPO) and recovery time objectives (RTO) all the way down to zero? Of course it would, but in the real world, cost is the determining factor. The cost of achieving low RPOs and RTOs is high, particularly in a world where you’re dealing with ever-increasing amounts of data and applications. For many organizations – financial institutions, for example – sub-minute RTOs are a must-have requirement.
One way to address this challenge is through the use of multiple RPOs and RTOs. This allows you to target lower (more costly) RPOs and RTOs to your business-critical data and applications, while applying higher (less costly) RPOs and RTOs to your less critical data.
But how can you effectively manage multiple RPOs and RTOs? Doesn’t that additional complexity mean more work for you? We’ll review the benefits and challenges of multiple RPOs and RTOs, along with how you can maintain control while still reducing costs.
Companies measure the effectiveness of their disaster recovery systems using two metrics: recovery point objective (RPO) and recovery time objective (RTO). The RPO indicates how current the database is when it’s been restored to service; was any transactional data lost during the interruption of service? The RTO refers to how long it takes to bring your recovered data back online, while retaining the best possible RPO. A “perfect” RPO of zero seconds would require synchronous replication between sites… which can be costly.
This cost affects both infrastructure and management. To support a lower RPO, more data must be captured, transferred, and applied more quickly to the recovery system – resulting in the need for faster processors, more memory, and more capacity across both servers and storage infrastructure, in order to avoid negative impacts to production SLAs. A lower RPO also requires a more low-latency network capacity for the data. The farther away your recovery database is from your production site, the lower the latency you’ll require, in order to recover more quickly.
To support a lower RTO, the secondary infrastructure will typically need to run a production-like environment that’s always up and ready to accept workloads. The overhead of maintaining dual “production ready” environments effectively doubles the overhead of maintaining your infrastructure, patches, updates, etc. Aside from the cost of acquiring the necessary software and hardware, you’re also forced to deal with complicated lifecycle updates, along with limitations to platform choices that can be utilized for recovery.
With the ever-increasing volume of data that falls under the recovery requirements of a typical environment today, a single recovery tier can be inadequate – from both the financial and risk standpoints. A higher-end approach – for example, an active-active dual site infrastructure with application-level transactional consistency – is not only costly, but is also subject to latency issues dependent on your geographic proximity to your backup facility. Relatively few applications in a small subset of industry segments currently require this level of protection. A more common approach would be a storage- or appliance-based, synchronous or asynchronous solution that delivers a near-zero RPO and a lower RTO, achieved by push-button orchestration. Of course, these types of solutions face the typical challenges of platform, location, and limited granular recovery options. Meanwhile, their respective cost is only slightly less than that of the higher-end, active-active approach.
The other extreme would be to utilize a copy of daily on-site backups transferred to an off-site facility, to serve the need for multiple types of protection and long-term retention. By utilizing techniques such as deduplication, incremental forever, and lower-cost storage, you can typically reduce your overall cost of infrastructure and management. However, this type of approach may not recover all of your essential data in a timely manner, especially as your business service level agreements (SLAs) become more aggressive over time.
From either extreme, these approaches can negatively impact either a) your cost, or b) your SLA, for the majority of your data volume, as depicted below:
Since many businesses require multiple RPOs based on application type, one approach that can help you achieve a better cost-risk balance would be to tier your environment based on your specific SLAs. In this type of approach, your SLAs must consider both your RPOs and your RTOs. Just as most organizations tier and classify their processing and storage in order to minimize cost, tiering your DR requirements can also have a positive impact on both hard and soft costs. In many cases, recovery tiers can be combined with other tiers, in order to form a service catalog. For example, an application that requires high performance might also receive lower RPO and RTO tiers from the catalog. This helps to reduce the overall management burden of having to administer too many resulting combinations.
In some cases, establishing recovery tiers for your RPOs/RTOs might require some combination of different technologies, or point solutions, that address a specific need or functionality. Trouble is, point solutions rarely scale easily or work efficiently with other technologies in your stack. In fact, having multiple point solutions can actually contribute to increased levels of data loss. According to Dynamic Business Technologies, an average of 2.36 TB of data is lost as a result of using multiple vendors!1
Additionally, introducing multiple technologies to create and manage tiers will typically result in shifting from the cost of elaborate infrastructure or relaxed SLAs to the increased overhead required to manage the complexity of tiering. This also impacts reporting on DR readiness and in performing DR tests and conducting actual recoveries. This is especially true when applications must span multiple recovery tiers. With point solutions, it’s difficult – if not impossible – to aggregate status information for DR readiness or to conduct testing across multiple tiers. For example, if you had a banking app spread across three tiers of server groups (database servers, application servers, web servers), and each of those groups was being protected using different technologies for DR or backup, even conducting the most basic DR testing would become a very complicated undertaking. Actual disaster recovery efforts would be equally challenging, which underscores the need for a unified solution that is capable of successfully addressing each tier.
And finally, the limitations of point solutions might restrict your platform choices. For example, you might find yourself with limited ability to utilize the public cloud, or to recover to a managed services environment. Limitations such as these can have a negative impact on both your direct costs, and your ability to efficiently manage internal resources.
With the Commvault® disaster recovery solution, you’re actually able to create multiple protection SLAs utilizing a single platform, deploying a unified approach that allows you to assign varying tiers of recovery as required for different data types, as depicted below:
Disaster Recovery Planning For Today’s Real World Outages
Automating disaster recovery and disaster recovery testing saves time and budget, plus reduces risk when there is an actual emergency.
This unified approach allows for three key benefits:
Right-size the costs associated with your replication infrastructure
In a multi-tiered RPO approach, you no longer have to choose between the highest or the lowest tier for your data. Tiers can be established based on your business SLAs, and you can assign data to the tier that’s appropriate to its business value. For example, Tier 1 applications can have aggressive RPOs, made possible by transferring incremental data more frequently and utilizing infrastructure at the target location that remains in a “warm” standby mode, helping to ensure a rapid RTO of just minutes. The resulting tier will send a larger volume of data, thereby consuming more infrastructure resources. A Tier 2 can be created with RPO of minutes-to-hours, and can take advantage of sending incremental data that has been deduplicated (resulting in a more compact file size) to a recovery infrastructure that remains in a “cold” standby mode and is only powered-on to apply changes at predetermined intervals, helping to reduce costs. Meanwhile, Tier 3 can be created to have incremental, deduplicated data sent to an online storage target every 24 hours and only have infrastructure provisioned at the time of recovery, further reducing costs. In this type of arrangement, more tiers can easily be introduced at any level, on an as-needed basis. This helps to balance costs across your infrastructure, while providing greater flexibility with regard to your recovery locations.
Avoid the frustration and overhead of managing multiple point technologies
In traditional approaches – using multiple technologies to manage multiple RPOs and RTOs and for different data types (e.g. virtual machines vs. databases vs. cloud instances), results in increased management complexity and the incremental cost of additional resources required to maintain your environment. By using a single, unified platform, you effectively reduce the resources required to onboard new workloads and achieve recoverability, along with shortening the learning curve. This, in turn, increases confidence in your ability to recover from virtually any type of disaster. By using a software-only approach and abstracting the hardware and hypervisor requirements, Commvault® provides greater flexibility, and, as your environment and infrastructure changes over time, your SLAs associated with various data tiers are easily maintained.
Easily monitor and manage your recovery tiers
With Commvault single unified platform, you are able to easily monitor and manage your SLAs without having to continually deploy, learn, and manage new technologies and point solutions and their related infrastructure. Since Commvault’s DR architecture allows you to establish multiple SLAs within a single platform, not only can you assign specific data types to an SLA tier, but different data types can also be easily moved across tiers. This allows you to accommodate any changes to your SLAs that might become necessary over time. With the robust reporting and monitoring tools included with the platform, users can pinpoint which component in your recovery environment (capture, transfer, apply) might be impacting your SLAs, enabling you to take quick corrective action when necessary. Users can also utilize these tools to uncover anything that is not currently being protected, and allow you to adjust your RPOs and RTOs to create hypothetical tiers and observe which parts of the environment might miss these new SLAs.
Bottom line? Your disaster recovery choices no longer need to be limited to two extremes – low RPOs at a high cost, or high RPOs at a low cost. With Commvault’s single, unified data management platform, it’s remarkably easy to balance your costs and SLAs, while retaining the flexibility needed to accommodate changing business needs. Commvault® software delivers disaster recovery functionalities that match the dynamic, complex nature of today’s enterprise data environments. That means supporting multiple data recovery tiers, extending into applications, endpoints and more, while providing you with the freedom to choose whichever infrastructure mix best fits your needs and budget.