The inexorable growth in unstructured data has been a constant theme of storage industry predictions for years. Whether from IDC projections, analysts like Fred Moore, or simply from the experiences users face in their own enterprises, the exponential growth has a serious impact both on storage technology choices, but also in the strategies needed to manage these ever-increasing data sets.
The problem really hits home with IT managers and data owners in the growing complexity that is caused by the proliferation of storage choices. As more innovative storage offerings and point solutions emerge, the onus falls back on IT managers, and in some cases users themselves, to sort out which silos their files are in, what value they have at that point in time, and whether they are in the right place.
Some surveys suggest that enterprises are discovering that as much as 50% of their data is unknown, or made up of uncategorized files. No idea what they are. Not sure if they can be deleted or not. And no easy way to find out. This is especially the case in Higher-Ed and Research, and in other unstructured data environments which often evolve to include lots of pockets of diverse storage types.
The problem drives the trend; which is the increasing need to know what the data is, and to automate the ability to take action on the data across any vendor storage type or tier. In a recent survey done by the Active Archive Alliance, 52% of respondents said Data Management was the biggest storage problem they expect to face in the coming year. The second highest concern? Data growth.
These problems are a growing constant, and have been so for years. Below are the 2020 Trends we at StrongBox Data Solutions see in the industry that are driven by such issues:
Object Storage is Still Object
Object storage is becoming even more pervasive as both cloud providers and on-prem traditional NAS vendors are leading the charge with enterprise grade offerings. But S3-like storage is still object storage and enterprise applications still prefer filesystems. The trend continues towards using object storage as a backup storage tier and not for application storage.
This pushes investment back towards traditional high cost storage models and locks out opportunities to reduce operating expenses.
This trend drives the need for more advanced metadata-driven workflow engines that can smartly position application data in file systems only when needed, and quickly move to object storage when idle – unlocking the promise that on-prem or cloud object storage can offer without impacting file workflows.
Cloud Deep Archive Options Turn the Cost Calculators on Their Head
New deep archive cloud offerings are finally challenging long term archive cost curves from traditional on-prem cold archive storage choices. The reduction in storage costs, to as low as $12,000 per year for a PB of archive at the same data durability, is quickly changing the game. These deep archives, however, are only object based and require object-compatible applications or workflows.
This trend drives the emergence of solutions that can present file-based access to all data wherever it is stored, in ways that support traditional workflows.
Cloud Vendor Lock-in & Enabling Multi-Cloud
The choice for a leading cloud provider has been a one-horse race until very recently. We are finally seeing the emergence of close second and third place offerings, each with their own unique enterprise features that are attractive for certain workloads and applications.
The challenge is that there are large differences in different cloud storage access protocols and offerings, leading to provider lock-in and high switching costs. Just like the need to bridge multiple on-prem storage types with a single cloud, organizations will increasingly look for solutions that can bridge multiple cloud offerings to ensure flexibility and provide a seamless storage fabric from any on-prem store type to one or more cloud providers.
On-Prem Cloud Will Drive Change in How Filesystems are Used
Virtualization providers are finally partnering with cloud providers to unify the experience between on-prem and off-prem cloud services. This trend is just starting now and will drive large pivots in the way we approach on-prem storage solutions. We expect to see smaller file systems that are localized in the equivalent of a “VPC” rather than larger shared file systems that we traditionally manage. Suddenly enterprises will go from dozens of large NAS shares to thousands of smaller siloed “software defined” storage volumes.
This will reduce primary storage costs, but it also shifts the need to enable automated, data-aware management of storage volumes to ensure proper just-in-time data placement. At the same time, it quickly dilutes the store side analytical knowledge about enterprise data and forces us to view storage even more generically than we do today.
This trend drives the need for solutions that can provide visibility and data intelligence across these new storage silos so as to provide better cost management with tiering and other metadata-driven automation, to eliminate over provisioning and operational complexity.
HPC Workload Portability Needs Data Assist
HPC build outs are going through an interesting transition in 2020 as GPU, CPU and purpose-built processor choices are widening. Language and compute developers are enabling cross-platform compatibility that unlocks workload portability between architectures without having to re-develop time-tested algorithms and recipes. The challenge is that the number of storage platforms and technology choices is also increasing, which strands workloads that are parked on unreachable storage targets.
This problem will drive solutions that can automatically move workloads between GPFS, Lustre, BeeGFS, CEPH FS, NFS, etc without developer or operator intervention.