Trying to keep up with sprawling data growth?
IT managers are in a constant battle between the needs of providing reliable data access to users, ensuring the data is protected, but also the drive to reduce costs, especially when Tier 1 storage is filled to the brim and they are faced with the need to buy more.
All storage professionals know the problem, and know that simply adding more primary storage is costly and probably unsustainable – So why is it that their default reaction in so many cases?
It’s because they’re dealing with heterogeneous storage environments.
Let’s deconstruct what “sprawling data growth” means. Yes, it’s about the sheer volume of data created. Yes, it’s about the speed at which the data are being created. But most importantly, it’s about the variety of use cases and storage requirements needed over the data lifecycle phases, and the multiple storage choices available that are best suited for each phase.
“Even within a single organization, different data types and workflows create different storage requirements. Plus, the stage in the data’s lifecycle results in different performance needs. Inevitably those needs will change, and the data would be better placed elsewhere.”
Storage infrastructures typically evolve to include multiple storage types from different vendors, particularly in larger data environments that manage unstructured data. As data ages, it is accessed less frequently. Various studies have shown that the vast majority of data that has not been accessed in 60 days is unlikely to ever be accessed again. The result is keeping infrequently accessed data on the most expensive high-performance storage platforms is a losing proposition, and adds unnecessary costs.
Here are the consequences:
Admins are not sure what data they have.
Storage admins know they have to move inactive data off their most expensive storage tier.
But what they may not know is which data is legitimately needed on primary, which they can move to lower cost storage, and which they can delete or archive. This lack of data intelligence to know how best to manage data placement on correct storage types for the data lifecycle phase is a big drain on IT budgets, unnecessarily inflating the overall storage costs for the organization.
Admins have no easy way to migrate data to low-cost storage.
A corollary of silos of incompatible storage types and no seamless communication between them is that storage managers have no easy way to move data from primary storage to lower-cost options like object, cloud, or tape without impacting user access and adding complexity to the environment.
Result: stale or persistent data become stranded in expensive high-performance systems by inertia, or due to the complexity of trying to figure out which data can be deleted, which must be kept, and when and where it should be moved to. Most importantly, how to do so without interfering with user access.
If your storage is filling up and your data reduction strategies don’t work, or if you don’t know how to move data off of it without impacting user access, then you’re forced to fill up your most expensive storage.
Adding more primary storage becomes the path of least resistance.
Heterogeneous storage environments = Lack of global data visibility = Data stuck in expensive storage tiers
So here’s the crux, admins cannot offload their primary storage effectively without the following requirements:
- Insights into their data. They need to know exactly what they have at any given point in the data lifecycle across multiple storage types and vendors (what data they have, where it is, which data is active or not, which they can move or delete).
- Seamless movement. They need to be able to move data from A to B no matter the storage type.
- No interruptions & autonomous. They need to be able to do all the above automatically in the background without impacting user access, or adding management workloads to IT administrators.
What’s the common thread between all these requirements?
They are related to understanding, moving, and accessing data scattered across otherwise incompatible storage types. In other words, intelligent data management – not storage allocation.
Against the backdrop of heterogeneous storage environments and their resultant data silos, we can reframe this as how to unify and gain control over data across any storage type from any vendor .
Adding more and more storage doesn’t deal with this core issue.
Reframing the data growth problem
When data growth is framed as a storage allocation issue, all solutions point to adding more storage or cutting storage footprint – instead of how to use data effectively and derive maximum value.
Let’s reframe the data growth challenge: it’s not about allocation – it’s about how you manage the utilization of your data.
It all starts with understanding your data: what kind of data it is, how active it is, who needs it, and when do they need it.
Because if you know more information about what your data is, how active it is, and even which business-driven priorities are important for each data type, you can make intelligent decisions about which data to keep on your most expensive highest performance storage platforms.
From these insights, you can manage your data by business policy.
You can establish a set of metadata-driven policies to automatically keep only the most important active data on the most expensive storage tier, and automatically place other classes of data on the most cost effective storage type without interrupting or changing user access.
In other words, the type of storage that data sits on is driven by business needs and completely automated by use case and the demands of real workflows. The key is to make this cross-platform optimization transparent to users, so they never have to go searching for their files or change the access path regardless of where the data moves to.
Bottom line: you save on storage costs by transparently offloading your most expensive storage types, and cut OPEX by not overloading your IT administrators. This frees up existing capacity so you can defer or eliminate the purchase of more primary storage. The net results are significant savings on storage CAPEX and OPEX with better utilization.
How StrongLink solves the data growth problem through metadata-driven intelligence
#1: StrongLink aggregates information about your data.
StrongLink automatically aggregates multiple sources of metadata about your files, including custom user-generated metadata, so you know exactly what data you have and how active it is, and what its business value is.
This gives you the insights to make intelligent policy-based decisions on where and when data should be stored.
#2: StrongLink automated cross-platform data policies.
StrongLink automatically moves data to any storage type based on your policies.
Data classification, data movement, actionable insights, tiering optimization – StrongLink makes it all happen automatically in the background with little or no disruption to users.
#3: StrongLink gives you direct access to your data.
All of your files on any storage type from any vendor can be directly accessed by users and applications via a policy-based global namespace. This access is persistent without changing user or application access paths, even if the data is copied or moved by policy to other storage locations. And this is done without proprietary hooks, like symbolic links, stubs, or agents on the storage.
#4: StrongLink lets you maximize the value of your data.
StrongLink provides you with actionable data insights based upon real-time information about your digital assets, made possible by the aggregation of multiple metadata types, including custom metadata tags. Business needs drive data placement, to ensure users have uninterrupted access, even while policy-based data placement optimizes utilization across storage types.
In practice at the extremes
A German multinational engineering and technology customer has over 120 petabytes of data on expensive Isilon primary storage. The majority of that data is seldom used but must be maintained and immediately accessible to users and applications.
The problem is that they’re data growth rates have exploded, approaching 2PB per day of new data. The cost of expanding the primary arrays to keep up with the accelerated growth was unsustainable.
With StrongLink, they’re able to seamlessly reduce the amount of data on the primary by automatically moving it to low-cost tape in open standard LTFS format, and at the same time keep ahead of the extreme 2PB per day data growth by offloading it as quickly as it lands. The data is still immediately accessible to users, but gone is the need to continue to expand the primary storage arrays.
StrongLink lets you effortlessly access, manage, move, protect and gain insights into your data across any storage type, including cloud and tape. Join over 200 organizations who trust us like Library of Congress, NASA, Hasbro, Sony, and more. Book a personal demo today!