Don’t Let Big Data Overwhelm Your Business – Part 2

Posted 3 November 2016 2:29 PM by Nigel Williams

In Part 1 of my blog about not letting big data overwhelm your business, I delved into what big data means to organizations today, how it has evolved and what challenges it presents. Part 2 will look at how Commvault software allows you to make sense of big data and turn it into an opportunity for your business.

Making Big Data Simple

Once Commvault software has been told what to do by setting up policies, it applies those policies to all newly created data. These policies can be run globally across the entire infrastructure and will detect changes to the environment, making data management much simpler.

It is also much easier if the key functions of the software like indexing, collection, and recovery integrate directly with the application via the API. This means the software can engage the big data name node to understand which data is net new. Once the data has been captured once, incremental backups can be applied forever. Synthetic fulls can be run every week and they will stitch the data together as if there was only one job that ever ran and that job ran yesterday!

Commvault software also takes the worry out of managing performance. Backups take full advantage of the distributed architecture using load balancing. Once multiple media agents have been deployed the software will intelligently use the fastest nodes and switch over to another one if one fails.

Making Big Data Intelligent

Commvault software is intelligent and understands the data structure. For example, recovery of data is performed as the application and infrastructure expects. This means the data is seen as net new and as one copy and so Hadoop (for example) creates the other copies used for redundancy natively.

A fast and intelligent way to do restores with very large data sets is to provide native access to the data. This provides immediate availability, avoiding the need to perform a restore and means limited workloads can be run immediately.

Seamless integration with the cloud allows the provisioning of both data and compute. This enables Cloud DR and Cloud Dev/Test Operations use cases. Pre-built workflows are provided for each of these.

High Performance for Big Data Protection

Ensuring good performance always requires considering both the data source and the destination to avoid bottlenecks. A Software Defined Storage (SDS) back-end ensures data can be received and de-duped at massive scale. The source can be scaled up by adding multiple media agents to increase parallel streams, which ensures parallel I/O all the way through. 

We’ve also taken advantage of the distributed architecture for resilience, as both the media agents (source) and SDS (destination) can both fail over. These highly available nodes continue to operate in the event of either drive or node level failures.

Controlling Big Data

Our final desirable attribute for big data is control over the data lifecycle. To achieve this we provide a policy driven set of services such as migrating and tiering the data that really make production environments much easier to control. This is vital with big data as growth can be both rapid and unpredictable. Uncontrolled growth can be very costly in big data environments as at some point a facility’s capacity such as power and cooling come under pressure – and those can be difficult and costly to resolve.

In Conclusion

Big data is one of the most important elements of digital business and will be a major contributor to the future competitiveness of many companies. However, it presents significant challenges of scale and complexity which are compounded by its lack of maturity. Its unique nature, which prioritises low latency and resilience to local failure, also make it storage inefficient, increasing the difficulty of managing the data. Adding Commvault software to big data helps overcome these challenges, adding intelligence and maturity to the implementation and making it far easier to protect and control.