What Will Blockchain Mean for Data Storage?

Blockchains will create immense amounts of immutable data—and how that data is stored can make or break the success of blockchain-based apps. Here’s why.

Blockchain

image_pdfimage_print

This is part two in a three-part series covering blockchain technologies. Read part one to learn how blockchain is modernizing enterprise apps.

Emerging technologies almost always raise an important question for companies on the brink of breakthroughs: What will this innovation mean for our existing IT infrastructure? Do we have the foundation to support it?

Blockchain just may be one of these scenarios. Those leveraging it will certainly face new implications on an already complex center of gravity: data management. To improve applications, supply chains, contracts, transactions, processes, and more, getting data dialed in is a foundational step. Let’s look at why.

Blockchain Data Basics: On-chain Data vs. Off-chain Data

As I noted in part one, blockchains are permanent, uneditable digital records of information, or “immutable ledgers.” (Immutable means you can’t delete or edit them, and ledgers are files where transactions are recorded.) These ledgers are distributed across a collection of decentralized nodes powered by computers around the world, rather than one centralized location, like a bank’s server. And because records exist in so many places, the record isn’t owned by just one entity.

In theory, no one can delete or change records once they’re on the chain. And when data can’t be deleted, it piles up.

Blockchains, by design, are not ideal for storing large amounts of data. Instead, when a transaction is logged onto a blockchain—say, a record of purchase—that event is logged across nodes. That’s called “on-chain” data. Any other data related to that transaction—for example, an image of the purchase, a description, etc.—is stored elsewhere. That’s called “off-chain” data.

Related reading: What is Web 3.0 and Why Is it Called the Data Web?

How Might Data Flow Through a Blockchain?

Say a blockchain is recording a shipment. When it passes through customs, it’s logged—along with metadata relating to its contents, the date, destination, etc. Then, in the container during transit, IoT sensors record the temperature and humidity, providing permanent proof in the event that there’s a quality concern upon receipt. The beauty of this is that no one party “owns” the data, so no records can be faked or disputed. Delays are immediately traceable.

The data associated with the shipment is logged on the chain, but stored in a database off the chain. How are the two connected?

Blockchains on their own make great smart contracts. Some can even carry out some simple calculations, but they often lack advanced capabilities and efficiencies. They can’t access off-chain data on their own, for one. Without a way to “plug” them into real-world data and applications, it’s hard to leverage the benefits of blockchain. Hitching a blockchain to a single server, API, or database makes the blockchain moot because you’re reintroducing centralization.

If blockchains are, by design, decentralized, anonymized, and secure, how data is stored and retrieved off-chain makes it a unique problem—one that some protocols are specifically designed to solve.

Accelerate 2023 is on it’s way! 

Blockchain Data Storage Solutions

There are a few workarounds to the blockchain data storage conundrum. The first is oracle networks.

Sometimes, an encrypted hash can direct users to off-chain storage where data is logged. The connection between the two happens via an oracle network. An oracle network, such as Chainlink, is a decentralized third-party technology that connects blockchain ledgers to the real world—and data storage. These provide the connective tissue, all while remaining decentralized. (This is not dissimilar to solutions like Portworx® that allow containerized apps to be stateful by connecting them to underlying storage.)

But that can’t be just any storage—especially as blockchain applications scale. To uphold the promise of blockchain’s speed and efficiency, storage has to be fast, incredibly scalable, and able to consolidate diverse types of data. Data pipelines can address the challenge of allowing blockchains to query relational data. Pipelines link and aggregate data across data sources in a decentralized environment, providing the parallelization needed to make data fast and agile.

The Graph is one of the most used blockchain protocols around. It organizes and indexes data and makes it easily accessible through subgraphs, which are trustworthy, foundational systems based on technologies like cryptography. Open API calls called subgraphs are behind the worldwide coordination of many blockchain projects, and they can be built and published by anyone. And, the question of decentralization is answered via an open network of participants who make it all possible, incentivized by tokens.

Is a Blockchain a Replacement for a Database?

Yes and no. Both deal in the storage of data, but they do it differently. And where the blockchain excels in immutability, it lacks in efficiency. Many blockchains can’t exist without oracle networks and protocols that connect them to underlying database storage. You could think of a blockchain as a next-gen database in that it does store data, but with some key differences:

  • Blockchains are distributed, not centralized. Typically, a database exists in one place, with a sole administrator controlling what is written to it. A blockchain exists across many nodes, each owned by a different user.
  • Blockchains are immutable. Once something is stored on the blockchain, it can’t be deleted or changed. It’s a system of record that can only be added to, not edited or deleted. Traditional, transactional databases are designed to be updated. Right away, this makes blockchains ideal for some use cases but not all.
  • Blockchains have many administrators, not just one. This removes the need to trust any single administrator or person on the blockchain. The blockchain itself is the proof of validity and defense against fraud or mistrust.
  • Blockchains aren’t efficient for storing large file sizes. Storage of data “on-chain” can be very expensive. This isn’t a very scalable or efficient route for more than core ledger data and related hashes. Costs can rack up per terabyte on the chain per transaction, with fees each time you want to read that data. Most SLAs can’t afford to wait minutes per megabyte, making blockchains nearly dependent on some sort of off-chain storage.

TL;DR, blockchain is a good fit when you need a system of record wrapped in total security, validity, and traceability. But for storage of larger files and more associated metadata, underlying databases will still be critical.

Blockchain Needs Dedicated, Modern Storage to Deliver

Blockchain is still maturing—good news for enterprises, but challenging news for storage considerations. Unstructured, off-chain data is going to accumulate exponentially, and better data storage platforms must be embedded into these new strategies. They’ll also require modified data management practices, access permissions, data models, and datastores, so they don’t cannibalize storage for existing apps.

“Blockchain won’t be able to disrupt any real-world industry unless the problem of data storage is resolved.” – JaxEnter.com

For blockchain applications to meet their SLAs, off-chain data storage will need to be powerful, elastic, and scalable. Unified fast file and object storage, in particular, will be important for managing data on a distributed system. Enterprises’ best bets as they wade into this new territory is to leverage and connect to existing, proven technologies such as Pure Storage® FlashBlade® with NVMe.

Check out part three, “10 Blockchain Uses Cases to Watch.”