Think about the way that we watched television back in 1985. Rabbit ear antennas strained to grab a weak signal from the sky, and maybe if we were lucky, we’d receive a handful of channels. Programs appeared at appointed times and you had to be in front of the set or you missed the show. A few lucky people had access to cable television, which gave them a clearer signal and a few more channels to choose from.
Over time, we developed technologies such as VHS recorders with timers that allowed us to record shows when we were away from home. We could even watch one program while recording a different one. Sometime later, with the advent of broadband Internet, the idea of receiving a signal over the air disappeared altogether and appointment television was replaced largely with on-demand programming.
That’s an amazing evolution in an extremely short period of time in which the consumer adapted to with relative ease. The result meant more selection and lower costs to the end user. But it has also came with a price: Too much choice! We went from having only a few channels to having so many choices that it can become difficult to know what you want to watch. Sometimes it’s hard to find what you’re looking for because it becomes buried in an ocean of too many choices.
When we look at the evolution of the IT industry over that same period of time, and specifically at how we handle data storage and retention, we see something very similar. We see a lot of technological growth in a short period of time; we see the explosion of cheap storage and data hoarding; and we see an industry that wants to cling to an old prescribed set of processes and procedures, which result in holding on to so much data that it can become impossible to find what you’re looking for in a reasonable period of time
One of the biggest challenges comes up around the discussions we have about backup and archive. There’s a fair amount of confusion about these terms. A lot of my customers stop me short to explain that they do weekly full backups, with daily synthetic fulls and a monthly ‘archive’ tape off to the vault, just in case ‘something happens’ …
For many years this really was kind of the unwritten standard practice in many IT shops. But as the volume of data has grown over the past few years, as backup windows had gotten smaller due to faster RPOs and RTOs, and as SLAs have become more demanding, it’s time to rethink the actual purpose of archiving and what the differences really are between the two.
George Crump at Storage Switzerland just wrote a great piece arguing that archive is important and that it should be a separate function from backup. I think that backup and archive can remain separate functions and can complement each other at the same time.
So do you need an archiving system? In a nutshell, yes you do. And the main reasons are to save space on your primary discs, to save time during processes (such as backup windows, or during indexing and searches), and to meet regulatory requirements for data retention.
Let’s look at four different types of archiving systems:
- Static archiving: This is the most traditional use and sense of the word, where we pick up a group of data from a file system or a group of objects from an application, such as an email system, and simply remove them from their original location before making a copy off to another location – such as to a tape, a near line slower disc, or to cloud storage. The main idea here is usually to save space on the primary disc, or to save a ‘moment in time’ state for the data for DR or regulatory purposes.
- Dynamic File Archiving: Similar to static file archiving but dynamic archiving leaves a file ‘stub’ where the archived data used to reside. So if a person or a process goes looking for that archived data, it will encounter the stub, and the archive software will recall the desired data from the archive and put it back in place.
- Dynamic Email Archiving: Creates a stub inside of an email program, so that an end user sees what appears to be an email or attachment as a live object, even though it has been archived away. An archive program can recall the desired data on-demand and present it to the end user with only a minor delay.
- Active Archive: Processes that determine the value of data to an organization by data classification (metadata, size, type, age, etc.) and actively move the archived data to different tiers of storage based on a set or pre-defined rules. This might include moving data as it ages from primary disc, to iSCSI, to cloud and then to tape.
While backup and archive are separate processes, it is always advantageous to have both processes aware of one another so that they can share infrastructure, such as file indexes, data repositories and deduplication databases. A well-functioning backup and archive system works together to reduce backup windows and increase recovery time objectives.
A modern archiving solution, separate from but in synchronization with a proper data protection strategy, is an important step toward getting value from your storage investments and making your SLAs for application uptime and RTOs.
Just as we have successfully made the leap from our old tube television sets to fiber connected widescreen HDTVs, it’s time to bring out data retention and protection strategies in line with modern standards and practices.