Friday, 23 December 2016

Primary Storage, Snapshots, Databases, Backup, and Archival.

Data in the enterprise comes in many forms. Simple flat files, transactional databases, scratch files, complex binary blobs, encrypted files, and whole block devices, and filesystem metadata. Simple flat files, such as documents, images, application and operating system files are by far the easiest to manage. These files can simply be scanned for access time to be sorted and managed for backup and archival. Some systems can even transparently symlink these files to other locations for archival purposes. In general, basic files in this category are opened and closed in rapid succession, and actually rarely change. This makes them ideal for backup as they can be copied as they are, and in the distant past, they were all that there was and that was enough.

Then came multitasking. With the introduction of multiple programs running in a virtual memory space, it became possible that files could be opened by two different applications at once. It became also possible that these locked files could be opened and changed in memory without being synchronized back to disk. So elaborate systems were developed to handle file locks, and buffers that flush their changes back to those files on a periodic or triggered basis. Databases in this space were always open, and could not be backed up as they were. Every transaction was logged to a separate set of files,  which could be played back to restore the database to functionality. This is still in use today, as reading the entire database may not be possible, or performant in a production system. This is called a transaction log. Mail servers, database management systems, and networked applications all had to develop software programming interfaces to backup to a single string of files. Essentially this format is called Tape Archive (tar.)

Eventually and quite recently actually, these systems became so large and complex as to require another layer of interface with the whole filesystem, there were certain applications and operating system files that simply were never closed for copy. The concept of Copy on Write was born. The entire filesystem was essentially always closed, and any writes were written as an incremental or completely new file, and the old one was marked for deletion. Filesystems in this modern era progressively implemented more pure copy on write transaction based journaling so files could be assured intact on system failure, and could be read for archival, or multiple application access. Keep in mind this is a one paragraph summation of 25 years of filesystem technology, and not specifically applicable to any single filesystem.

Along with journaling, which allowed a system to retain filesystem integrity, there came an idea that the files could intelligently retain the old copies of these files, and the state of the filesystem itself, as something called a snapshot. All of this stems from the microcosm of databases applied to general filesystems. Again databases still need to be backed up and accessed through controlled methods, but slowly the features of databases find their way into operating systems and filesystems. Modern filesystems use shadow copies and snapshotting to allow rollback of file changes, complete system restore, and undeletion of files as long as the free space hasn’t been reallocated.

Which brings us to my next point which is the difference between a backup or archive, and a snapshot. A snapshot is a picture of what a disk used to be. This picture is kept on the same disk, and in the event of a physical media failure or overuse of the disk itself, is in totality useless. There needs to be sufficient free space on the disk to hold the old snapshots, and if the disk fails, all is still lost.  As media redundancy is easily managed to virtually preclude failure, space considerations especially in aged or unmanaged filesystems, can easily get out of hand. The effect of a filesystem growing near to capacity is essentially a limitation of usable features. As time moves on, simple file rollback features will lose all effectiveness, and users will have to go to the backup to find replacements.

There are products and systems to automatically compress and move files that are unlikely to be accessed in the near future. These systems usually create a separate filesystem and replace your files with links to that system. This has the net effect of reducing the primary storage footprint, the backup load, and allowing your filesystem to grow effectively forever. In general, this is not such a good thing as it sounds, as the archive storage may still fill up, and you then have an effective filesystem that is larger than the maximum theoretical size, which will have to be forcibly pruned to ever restore properly. Also, your backup system, if the archive system is not integrated, probably will be unaware of the archive system. This would mean that the archived data would be lost in the event of a disaster or catastrophe.

Which brings about another point, whatever your backup vendor supports, you are effectively bound to use those products for the life of the backup system. This may be ten or more years and may impact business flexibility. Enterprise business systems backup products easily can cost dozens of thousands per year, and however flexible your systems need to be, so your must your backup vendor provide.

Long term planning and backup systems go hand in hand. Ideally, you should be shooting for a 7 or 12-year lifespan for these systems. They should be able to scale in features and load for the predicted curve of growth with a very wide margin for error. Conservatively, you should plan on a 25% data growth rate per year minimum. Generally speaking 50 to 100% is far more likely.  Highly integrated backup systems truly are a requirement of Information Services, and while costly, failure to effectively plan for disaster or catastrophe will lead to and end of business continuity, and likely the continuity of your employment.

Jason Zhang is the product marketing person for Rocket Software's Backup, Storage, and Cloud solutions.

Tuesday, 13 December 2016

The Best of Both Worlds Regarding Mainframe Storage and the Cloud

It might shock you to hear that managing data has never been more difficult than it is today.  Data is growing at the speed of light, while IT budgets are shrinking at a similar pace.  All of this growth and change is forcing administrators to find more relevant ways to successfully manage and store data.  This is no easy task, as there are many regulatory constraints with respect to data retention, and the business value of the data needs to be considered as well.  Those within the IT world likely remember (with fondness) the hierarchical storage management systems (HSM), which have traditionally played a key role in the mainframe information lifecycle management (ILM).  Though this was once a reliable and effective way to manage company data, gone are the days when businesses can put full confidence in such a method.  The truth of the matter is that things have become much more complicated.

There is a growing need to collect information and data, and the bad news with this is that there is simply not enough money in the budget to handle the huge load.  In fact, not only are budgets feeling the growth, but even current systems can’t keep up with the vast pace of the increase in data and its value.  It is estimated that global data center traffic will soon triple its numbers from 2013.  You can imagine what a tremendous strain this quick growth poses to HSM and ILM.  Administrators are left with loads of questions such as how long must data be kept, what data must be stored, what data is safe for deletion, and when it is safe to delete certain data.  These questions are simply the tip of the iceberg when it comes to data management.  Regulatory requirements, estimated costs, and the issues of backup, recovery and accessibility for critical data are areas of concern that also must be addressed with the changing atmosphere of tremendous data growth.

There is an alluring solution that has come on the scene that might make heads turn with respect to management of stored data.  The idea of hybrid cloud storage is making administrators within the IT world and businesses alike think that there might actually be a way to manage this vast amount of data in a cost effective way.  So, what would this hybrid cloud look like?  Essentially, it would be a combination of capabilities found in both private and public cloud storage solutions.  It would combine on-site company data with storage capabilities found on the public cloud.  Why would this be a good solution?  The reason is because companies are looking for a cost effective way to manage the massive influx of data.  This hybrid cloud solution would offer just that.  The best part is, users would only need to pay for what they use regarding their storage needs.  The goods news is, the options are seemingly unlimited, increasing or decreasing as client needs shift over time.  With a virtualized architecture in place, the variety of storage options are endless.  Imagine what it would be like to no longer be worried about the provider or the type of storage you are managing.  With the hybrid cloud storage system in place, these worries would go out the window.  Think of it as commodity storage.  Those within the business world understand that this type of storage has proven to work well within their spheres, ultimately offering a limitless capacity to meet all of their data storage needs.  What could be better?

In this fast-paced, shifting world, it’s high time relevant solutions come to the forefront that are effective for the growth and change so common in the world of technology today.  Keep in mind that the vast influx of data could become a huge problem if solutions such as the hybrid cloud options are not considered.  This combination of cloud storage is a great way to lower storage costs as the retention time increases, and the data value decreases.  With this solution, policies are respected, flexibility is gained, and costs are cut.  When it comes to managing data effectively over time, the hybrid cloud storage system is a solution that almost anyone could get behind!

Jason Zhang is the product marketing person for Rocket Software's Backup, Storage, and Cloud solutions.