Back To Basics: Data Classification

Posted 07/29/2016 by Commvault

"Data classification is the process of sorting and categorizing data into various types, forms or any other distinct class. Data classification enables the separation and classification of data according to data set requirements for various business or personal objectives. It is mainly a data management process."1

While most data centers begin as being “purpose built” for a specific application, over time we see incremental changes take place that usually take the scope and character of the data center far from its original purpose. These changes are usually due to mergers and acquisitions, changes in the organization’s mission and shifts in technology such as adoption of virtual servers and hyper converged systems.

Having completed this task allows and organization to categorize and separate data by a number of different metadata variables including: by type, by physical geography, by network, by user and by group. Mastery of these simple variables gives the organization these benefits:

Service Level Agreements

  • Mapping specific data to specific applications and business services and making sure that the infrastructure (storage and network) allows for proper service levels.
  • Storing the right data in the right location, including proper SAN LUNS for snapshot RPO


Risk Mitigation (Business Continuity/DR)

  • Identifying truly business critical data as part of a business continuity and disaster recovery plan and leveraging SLA’s
  • Cost of downtime is reduced because business critical workloads can be identified and recovered with faster RTO and synchronization to the cloud



  • Payment Card Compliance: such as PCI compliance where payment information would be separated from other user data or from the payload being delivered upon successful completion of payment
  • Patient Data Compliance: keeping hospital patient data confidential, which might include separating patient billing information from sensitive imaging or doctor summary notes
  • Geographical compliance: keeping financial data within a specific countries’ geographical borders


Space Saving

  • Making sure that data is stored at the proper tier versus its true value to the organization.
  • Automatic archiving of data based on metadata rules, taking into consideration compliance drivers such as geography and file extension rules


Cloud (OPEX) Efficiencies

  • Utilizing the proper cloud products for the proper type of data, for both storage and compute
  • Migrating raw data and virtual workloads between terrestrial data centers and cloud based on business rules


Rules Engine

  • Creation of policies that map business rules to actual data
  • Alerting and reporting as to the disposition of that data


API Integration

  • Bidirectional integration to other tools such as service desk or ITIL type conductors that require an inventory of available data and file systems
  • Workflow automation to allow scheduled execution of complex tasks with decision trees and nested dependencies


As you can see, we’re not just talking about creating a big pile of file directories. It is important to be able to understand the differences between a regular file or a folder, and data that’s inside of a container, such a virtual disk, or inside of BLOB space. So, a good data classification solution has to be able to talk to both hypervisors and applications. And then it has to be able to reconcile what it finds to the file system upon which those applications reside.

A good solution also needs to be fast, because some of the data that needs to be accounted for in a modern data center probably lives inside of a SAN snapshot, and may only be there for a brief period of time. So add in the need for visibility at the snapshot layer, and to be able to tie all of that together into something easily understood by an administrator, “According to this report, that mission critical SQL database lives in a Windows 2013 Hyper-V guest on volume 4 of that SAN, which last snapped 3 hours ago, boss…”

Once data has been classified, the next challenge is to take actions based on what has been found. Truly comprehensive data management strategies need to be able to do more than just provide an inventory. Both ad-hoc actions and repeatable policies should be put into action based on what is found.

Data classification is the first step toward creating a sane data lifecycle management strategy for your organization. Commvault has been helping its customers with data classification for over 20 years while also now providing a complete data lifecycle platform, which includes cloud and infrastructure management, disaster recovery and data protection.