How to Overcome the Pull of Data Gravity

Overcoming data gravity is a matter of speed, requiring a move away from 20th century infrastructure to a modern storage platform.

Data Gravity

4 minutes

Legend has it that Sir Isaac Newton discovered gravity when an apple fell from a tree and hit him on the head while he was thinking about the forces of nature. This ultimately led to his theory that every particle attracts every other particle in the universe with a force directly proportional to the product of their masses.

So, what does this have to do with data? While data doesn’t have gravitational pull in the scientific sense, it is a useful way to think about modern data and data-intensive applications, such as analytics applications—especially in the context of digital transformation.

A Consequence of Data Volume

First coined by software engineer David McCrory, data gravity refers to the relationship between data and applications. Similar to the attraction between objects explained by the Law of Gravity, data and applications are drawn to each other. As data sets grow larger, they gain more gravity, making them more difficult, inefficient, and expensive to move. The result is that data remains stationary while applications gravitate to it.

One example of data gravity in action is Dropbox. It began simply as a file storage service. However, as it grew and became ubiquitous, third-party applications eventually had to become compatible with it because of the massive amount of structured and unstructured data it hosts.

Read What Is Data Gravity? to learn more >>

Big Data Creates Big Challenges

It’s often said that we’re in the information age. Yet, we still sometimes fail to appreciate how our growing reliance on big data is driving data volumes to grow at an exponential rate. According to Forbes, from 2010 to 2020, the amount of data created, captured, copied, and consumed jumped from 1.2. trillion gigabytes to 59 trillion gigabytes—an almost 5,000% increase. And IDC estimates that the amount of data created over the next three years will be more than the data created over the past 30 years.

Mobile devices and the internet of things are generating much of this data, and it all needs to be stored somewhere. Meanwhile, new data-intensive applications like analytics and machine learning are dependent on the large amounts of data being produced, which increasingly reside in locations that aren’t always easily centralized to the cloud.

The simultaneous growth of data production and demand for data has created a data gravity problem that simply can’t be overlooked. Across every industry, IT leaders are struggling with enormous volumes of unstructured data that are becoming increasingly unwieldy to manage and difficult for modern data-intensive applications to leverage. It’s hindering innovation, limiting performance, and reducing productivity.

Since moving data isn’t a simple task, a common strategy is to adapt processes and systems to the gravity produced by it. Many traditional infrastructure providers have come up with architectures such as data lakes. However, such approaches have been more focused on storing the data efficiently than utilizing the data optimally. But this approach isn’t a viable, long-term strategy. Instead, IT architectures must be designed to reflect the reality of data gravity.

Break Free with Pure 

Overcoming data gravity is a matter of speed. This means moving away from 20th-century infrastructure and sprawling data silos and shrinking the time and distance between data sets being processed. A unified fast file and object (UFFO) storage  platform eliminates the data gravity challenge by supporting both traditional and next-generation workloads and applications while delivering the scale, performance, and flexibility needed.

A modern storage platform delivers the following benefits:

Speed: With a UFFO platform, analytics tools are powered—from ingest to visualization—for real-time results at any scale. Pure Storage® FlashBlade®, the industry’s leading UFFO storage platform, provides massive throughput for accelerating every aspect of AI workflows and data analytics pipelines—at any scale.

Simplicity: Big data can be unlocked with scalable, efficient data storage that is easily leveraged by both users and applications. Compute is disaggregated from storage, providing an efficient platform for hosting multiple analytics applications, concurrently supporting large numbers of users, and easily scaling with data growth. In other words, you can break down efficiency-hindering data silos.

Cloud Everywhere: Analytics can be developed and deployed anywhere for a better ROI on compute and storage. Gain the control and efficiency of on-prem solutions with the ability to run both traditional and cloud-native applications with consistency and performance—all with the economics of the cloud consumption model through Evergreen//One™.

A Flexible Storage Subscription: Gain the ability to speed up innovation by eliminating costly, complex forklift upgrades or data migrations while getting the flexibility to redirect budgets to compute. Pure’s Evergreen™ storage subscriptions offers an innovation- and cloud-everywhere approach that makes purchasing easier, freeing budget for areas critical for analytics locally or in the cloud.