Why Big Data “Complexity” or “Storage Replacement” is a False Choice

Storage vendors regularly tell their customers that they only have two choices when it comes to big data management.  They can either continue to run a very complicated, labor-intensive, multi-tiered, multi-cloud, multi-vendor storage ecosystem, or they can consolidate everything into a single-vendor big data storage replacement solution, which may still include multiple storage systems from that vendor.

The first choice merely acknowledges the difficult problem of big data management and inevitable data movement.  The basic underlying data storage premise is that the data need not ever be moved off system. That’s a misleading assumption.  The use of data has become globalized. A distributed workforce, partners, and collaborators are too often not near the stored data.  Distance creates unacceptable response times due to higher latencies. When that data needs to be analyzed, fed into an AI (machine learning, deep machine learning, and neural networks), modified by collaborators, archived in a way to provide direct user access, or cold archived, it inevitably needs to be moved.  And of course, a large share of data has a long shelf life, and will typically outlast any storage platform it is housed on today. 

No one enjoys moving hundreds of terabytes or petabytes between storage systems.  It’s incredibly time consuming, fraught with errors that need correcting, and adds a heavy burden on storage administrators and users alike.  It is never easy, and is extremely labor-intensive especially between different types of systems or vendors. What’s worse, it typically results in downtime, which is unacceptable in a 24×7 world.  

There’s a reason most storage vendors charge approximately 30% of the cost of new storage systems for professional services just to move the data from old storage to new during a tech refresh.  There are typically as many as 43 very complicated & time-consuming steps that can take months or even years to complete even with modern migration tools. This process adds operational and other costs, and directly impacts user access to data. It is not uncommon that this has a direct business impact as a result, due to users and applications access to data being disrupted.  In today’s fast paced global economy, that is simply no longer a viable option. This is why storage vendors paint data movement as a complicated, risky, fork-lift upgrade, as opposed to staying within that vendor’s ecosystem and buying more of the same.  

The problem is, viewing data management as a function of the storage platformsis distorted and misleading.

Their solution is to recommend customers move all data to their purported bottomless scalable storage system and make it the hub for all applications, users, partners, collaborations, analytics, and AI.  The problem is, viewing data management as a function of the storage platforms is distorted and misleading.  Obviously, this choice is good for the storage vendor because it locks in their storage system.  But this is upside down, placing the value on the storage platform, and not on the data. And it incorrectly assumes their storage system is adequate for all storage use cases, from extremely high performance to cold archiving.  Frankly, that’s a bit delusional. There’s a reason why there are many different kinds of storage systems in the market today.  

Storage is always a mix of performance, capacity, scalability, functionality, and total cost.  There are generally three storage use case classifications. High performance high cost, low performance low cost, and mid-level performance acceptable cost.  

High performance storage is typically measured in IOPS, throughput and/or low latency.  Those storage systems focused primarily on extreme IOPS performance, and utilize DRAM, persistent memory, NVMe flash, NVMEoF, or flash SSDs.  Low latency is very important here. Such systems are typically a bit light on CPU-intensive storage services that reduce performance, such as data reduction, data protection, disaster recovery (DR), and high availability (HA).  They also cost quite a bit and are generally not very scalable, commonly topping off in the low petabytes. High performance throughput is aimed at high performance computing (HPC) applications via parallel file systems such as GPFS (Spectrum Scale), Lustre, Panasas, WekaIO, Quobyte, and others.  Scalability here is very important. Cost and functionality are factors as they are with all storage, but are not the primary drivers for this category.  

Low performance storage on the other hand, is focused mostly on low cost, data longevity, data resilience, huge scalability, lower performance needs, and again with emphasis on low total cost of ownership (TCO).   Most storage systems fall into the mid-level performance class that attempts to balance performance against capacity, scalability, functionality and cost, instead of leaning heavily in one direction or another. They’re designed for general market use cases and secondary storage.  This is where the vast majority of storage is sold and used, including NAS, Object, tape, and cloud storage. It’s the bulge in the bell curve.  

It’s obvious to the lay storage user that there is no one size that fits all.   Storage vendors know this. To deal with it, many of them offer multiple types of storage systems and their own data movement solutions that lock-in the customer.   So as long as customers use the supported systems, cloud storage, and data movement solutions, the vendors claim to have solved the problem. Except they haven’t. They’ve made it worse.  The majority of these storage vendor solutions require any moved data be rehydrated back on the original storage system to be read or altered. That adds a lot of complexity, wasted time, and cost.  It’s also a fragile system often relying on stubs, symlinks or proprietary elements that can break, which can cause data to be lost. Or they will solve part of the problem, but not be able to deal with other storage choices outside their ecosystem that the customer may have now, or want to add. And it fails to address partners and collaborators that may also have different storage systems.

This is why it is a false choice to pick between big data complexity or storage replacement.  StrongBox Data knew there had to be a better way. The big data management and movement problems are what StrongBox Data solves cost-effectively with StrongLink, their flagship Autonomous Data Management solution.  StrongLink is the first product specifically designed to manage, copy, move, archive, and delete big data from any storage type to any, including file, object, cloud storage, or even LTFS tape storage. With built-in metadata-driven intelligence, StrongLink aggregates, harvests, and parses the metadata and continually improves to become more and more effective.  It’s data centric not storage centric. It’s a solution that is vendor-agnostic, and automates data management across otherwise incompatible systems, vendors, and locations. It’s automated and intuitive to both experts and lay users, and in the process takes the pain out of major data migrations, to accomplish these previously difficult projects with minimal or no downtime.  We invite you to come see for yourself.

To learn more about StrongLink cost-effective Autonomous Data Management, go here.


About the Author

Floyd Christofferson is CEO of StrongBox Data Solutions. For over 25 years he has been focused on content management and storage workflows, driving technologies and methods needed to manage massive volumes of data in some of the world’s largest storage environments.

More about Floyd