Back To Basics: Data Classification

Posted 29 July 2016 8:54 PM by Mark Bentkower



"Data classification is the process of sorting and categorizing data into various types, forms or any other distinct class. Data classification enables the separation and classification of data according to data set requirements for various business or personal objectives. It is mainly a data management process."1

While most data centers begin as being “purpose built” for a specific application, over time we see incremental changes take place that usually take the scope and character of the data center far from its original purpose. These changes are usually due to mergers and acquisitions, changes in the organization’s mission and shifts in technology such as adoption of virtual servers and hyper converged systems.

Having completed this task allows and organization to categorize and separate data by a number of different metadata variables including: by type, by physical geography, by network, by user and by group. Mastery of these simple variables gives the organization these benefits:

Service Level Agreements

  • Mapping specific data to specific applications and business services and making sure that the infrastructure (storage and network) allows for proper service levels.
  • Storing the right data in the right location, including proper SAN LUNS for snapshot RPO

 

Risk Mitigation (Business Continuity/DR)

  • Identifying truly business critical data as part of a business continuity and disaster recovery plan and leveraging SLA’s
  • Cost of downtime is reduced because business critical workloads can be identified and recovered with faster RTO and synchronization to the cloud

 

Compliance

  • Payment Card Compliance: such as PCI compliance where payment information would be separated from other user data or from the payload being delivered upon successful completion of payment
  • Patient Data Compliance: keeping hospital patient data confidential, which might include separating patient billing information from sensitive imaging or doctor summary notes
  • Geographical compliance: keeping financial data within a specific countries’ geographical borders

 

Space Saving

  • Making sure that data is stored at the proper tier versus its true value to the organization.
  • Automatic archiving of data based on metadata rules, taking into consideration compliance drivers such as geography and file extension rules

 

Cloud (OPEX) Efficiencies

  • Utilizing the proper cloud products for the proper type of data, for both storage and compute
  • Migrating raw data and virtual workloads between terrestrial data centers and cloud based on business rules

 

Rules Engine

  • Creation of policies that map business rules to actual data
  • Alerting and reporting as to the disposition of that data

 

API Integration

  • Bidirectional integration to other tools such as service desk or ITIL type conductors that require an inventory of available data and file systems
  • Workflow automation to allow scheduled execution of complex tasks with decision trees and nested dependencies

 

As you can see, we’re not just talking about creating a big pile of file directories. It is important to be able to understand the differences between a regular file or a folder, and data that’s inside of a container, such a virtual disk, or inside of BLOB space. So, a good data classification solution has to be able to talk to both hypervisors and applications. And then it has to be able to reconcile what it finds to the file system upon which those applications reside.

A good solution also needs to be fast, because some of the data that needs to be accounted for in a modern data center probably lives inside of a SAN snapshot, and may only be there for a brief period of time. So add in the need for visibility at the snapshot layer, and to be able to tie all of that together into something easily understood by an administrator, “According to this report, that mission critical SQL database lives in a Windows 2013 Hyper-V guest on volume 4 of that SAN, which last snapped 3 hours ago, boss…”

Once data has been classified, the next challenge is to take actions based on what has been found. Truly comprehensive data management strategies need to be able to do more than just provide an inventory. Both ad-hoc actions and repeatable policies should be put into action based on what is found.

Data classification is the first step toward creating a sane data lifecycle management strategy for your organization. Commvault has been helping its customers with data classification for over 20 years while also now providing a complete data lifecycle platform, which includes cloud and infrastructure management, disaster recovery and data protection.

1Techopedia

Share: