5 Ways To Be Data Ready In 2020
By Karen Lopez
1. Automate Backups and Test Recovery
It seems odd that in 2020 we’d have to be discussing the importance of data backups and recoveries. But here we are. I still hear about organizations caught without backups of key data, or backups they cannot restore.
Testing backups versus testing recovery
We’ve all been taught to not just set up backups, but to confirm that the backups were successful. Yet how often do we test recovery? I only half-jokingly say that “We don’t need backups; we need recovery.” Having a 100 percent success rate on backups in no way guarantees that you’ll be able to restore data from those backups. I suggest you test all your backups by attempting restores on a regular basis, even if all you do is a statistical sampling of restores.
We hear about organizations that are attacked by ransomware and have no fast way of recovering without attempting to pay the ransom, thereby supporting criminal acts. These ransomware stories scare me. I can’t imagine how one must feel to have to tell executives that while they do have some backups, they have none for some systems or for data they only now found out exist. Or they did have backups, but because they were configured incorrectly the backups have also been encrypted. Let’s be data ready in 2020 by ensuring everything we need is backed up, restorable and configured to survive ransomware attacks.
Do you want to hear more about this topic? You can view the webinar, “The Big 3: Be Ready For Ransomware, Multi-Cloud And Compliance In 2020,” in which I discuss this and other issues in this post.
2. Know what data you hold and where it is located
It has always amazed me that organizations collect data and don’t have a way to figure out what data they already collect, why they collect and how it has been enhanced or processed. Location of data, whether in the cloud, on-premises, or somewhere in flight between the two, is key to understanding both compliance and business effectiveness.
Automate data inventories
It would be nice if every data deployment, every microservice, every application modeled and documented the exact data collected, stored, messaged, or processed at every step in the development process. And yet, after 35 years of being a data architect, I know that has never happened.
I recommend that organizations use automated tools to inventory what data they have, where it is located and all the properties of that data.
Data sovereignty is a compliance concept focused on where data may be stored or used. As more data privacy and data protection legislation arrives, data sovereignty requirements become more complex. Deploying data to the wrong location can result in significant fines under the General Data Protection Regulation (GDPR) and potentially criminal charges. Many development teams miss out on data in flight; do you know where your data is being processed when you use a cloud-based process? Do you know where that software as a service is hosting your data, even if temporarily?
If you have global workers, you know that application latency increases the farther away the user is from the source of the data. This data physics challenge cannot be overcome by scaling up, out or any other direction. Monitoring for these data latencies and providing data closer to end users without disrupting their workflows is the best way to overcome this issue.
Let’s be data ready in 2020 by knowing what data we collected, all the information about it and where it is located.
3. Record what you know about your data
Once you have a good idea about what data you have and where it is located, that information needs to be managed as well.
This metadata has both business and technology value. If data scientists spend 80 percent of their sourcing, prepping and cleansing data, they are spending four out of five days not doing data science. Plus, there’s no reason to believe that other data users are spending less time doing the same thing.
Having tools that can automate the detection of data stores, their content and their location, can result in significant time savings in leveraging that data to support business processes. We can enhance that metadata with information about the reasons for collection, allowable uses and any special issues with that data.
Let’s be ready in 2020 by having inventories of data and metadata already completed.
4. Improve data quality
One factor that often gets forgotten in data protection is that data quality issues also harm data.
You can’t forget a customer if you have their name wrong in your system. You can’t provide a review of a customer’s data if they have duplicate records and you only send them one set. You can’t apply legal requirements for customer data protection if the underlying data misrepresents that customer.
Data accuracy not only adds value to an organization, it also helps organizations protect it.
Another factor in data quality is timeliness. Not just for recent data, but knowing when to delete it. Sometimes there are compliance reasons you must remove data from your system and sometimes it’s just good data hygiene.
Let’s be compliance data ready in 2020 by having good quality data.
5. Test your security practices
It’s not enough to train staff on security practices; we must monitor their use and test for compliance with them.
I believe thenumber one data vulnerability is still SQL injection, and this development challenge has held the position for more than a decade. There are tools and practices that stop SQL injections from happening to your data, so I believe strongly that there is no excuse for this data threat to be an issue.
I predict in the next couple of years unprotected databases will overtake SQL injection. Unprotected databases happen because they are deployed with default or no administrative credentials. Their backups are placed, unencrypted, on open data stores in the cloud. None of these practices have any legitimate reason to happen.
Phishing continues to be a key vector for ransomware. The emails and text messages that phish for credentials are getting better. One best practice is to test users with phishing attempts, then using the results to help them understand how those messages work.
Let’s be data ready in 2020 by leveraging security features to keep our data safe, knowing what data we curate, where it’s located, how it is protected, what quality we have and by testing our recovery steps regularly.
I wish you a happy and data-ready 2020.