matt_prigge
Contributing Editor

Managing mountains of data: The big picture

analysis
Dec 14, 20094 mins

To craft a complete storage strategy, you need to drill into all levels of the infrastructure -- but don't forget the big picture

When I think of enterprise storage, I reflexively think of a big stack of highly redundant, high-performance disk.

That disk generally gets a lot of attention because it’s flashy, fast, and expensive, and it plays a central role in your most critical workloads. But the enterprise data explosion goes beyond your core primary storage assets; it also touches everything that protects, monitors, and secures that data. As you weigh storage investments, remember to consider the secondary effects of throwing all that snazzy disk onto your network.

[ Check out InfoWorld’s new iGuide to the Enterprise Data Explosion. We’ll be adding content all the time, so keep checking back for updates. ]

Primary storage. Storage in the form of spinning disk has been around for more than 50 years now. All businesses, from large enterprises to SMBs, need a coherent primary storage strategy to deal with the continuous and rapid expansion of data. The process begins with a thorough assessment of business needs, current data throughput, and the state of the existing storage infrastructure. A host of technologies, including storage virtualization, new SSD products, enhanced caching schemes, and disk deduplication provide new tools to craft a rational storage architecture that provides performance where it’s needed and scalability as a matter of course.

Backup infrastructure. For as long as we’ve had primary storage assets, we’ve needed to protect them from loss. Backup infrastructure is just as critical to your organization’s survival as primary storage. Natural disaster, fire, viruses, data corruption, user error, and administrator error are only a few of a nearly limitless number of bad things that can happen to data. Recently, backup options have expanded dramatically; what used to be a commodity space filled with varying types of tape drives and and write-once media has opened up to all manner of disk-to-disk and offsite backup solutions. Automated backup deduplication, compression, and site-to-site replication are rapidly evolving as the technology matures.

Data deduplication. Deduplication has become almost a prerequisite for a backup solution. A deduplication system identifies and eliminates redundant files, reducing the amount of space necessary to store backups from 10:1 to 50:1 and beyond, depending on the level of redundancy in the data. Almost all major backup software either includes or has announced native deduplication capabilities. The real promise of deduplication won’t be realized completely until it becomes commonplace in primary storage, not just in the backup tier. If implemented correctly, deduplication can also increase disk cache performance dramatically, particularly in virtualized server infrastructures with a high degree of duplication.

Log analysis and reporting. As primary storage and backup environments continue to grow in size and complexity, collecting and reporting on how well those environments are working becomes imperative. No CFO I know likes to get an unexpected request for a pile of money to upgrade a SAN that’s reaching capacity. Monitoring and reporting back to your organization about the state of your storage assets is critical. Whether it’s to give accurate budget guidance to the corner office, to maintain regulatory compliance, or to communicate your backup timeframes to your application stakeholders, it’s vital to know what’s going on under the hood.

Enterprise data protection. Here we have a broad set of practices and technologies designed to protect private and proprietary data both “at rest” and “in motion” throughout the enterprise and its business partners. Full-disk encryption for laptops and PCs, database encryption, encryption of backed up and archived data, secure file sharing solutions, database and file server monitoring, multifactor authentication, and data leak prevention solutions are a number of the point solutions that companies are using to enforce policies (and meet compliance requirements) for handling sensitive data, including financial, health, customer, and private information.

Cloud solutions. Cloud infrastructure services (combined with virtualization) are poised to serve as the foundation of many companies’ disaster recovery plans. In addition, many of the first practical cloud-based applications have been built to store, manage, and process massive data sets, leveraging large clusters of commodity hardware and using programming frameworks such as MapReduce for reliable and scalable distributed computing.

Today most enterprises deal with rampant data growth simply by throwing more terabytes of capacity at the problem. But managing and securing huge volumes of data becomes more and more painful over time, and the cost of buying and maintaining new hardware without increased efficiency can’t be sustained forever. Without a big-picture view of the challenge and an interlocking strategy for each of the six major segments, you’ll never succeed in effectively managing the explosive growth of data.

This article, “Managing mountains of data: The big picture,” was originally published at InfoWorld.com. Follow the latest developments in storage at InfoWorld.com.