image_pdfimage_print

Details continue to emerge around Microsoft SQL Server 2022. For data professionals everywhere, it’s exciting to learn about what Microsoft calls “the most Azure-enabled release of SQL Server yet.”

The latest feature enhancements go right to the heart of organizations attempting to deal with the increasingly exponential rise in data. Challenges continue to be pervasive around relational and unstructured data that now needs to reside at the edge, on-premises, and in the cloud.

According to Microsoft’s announcement: “The most transformative companies drive predictive insights on current data, whereas others may struggle to drive even reactive insights to their historical data. Information may be siloed across geographies and divisions.”

Read on to get my insights into the new feature announcements and implications for database administrators and data professionals.

One of the most exciting improvements will be enabling direct connections to S3 from SQL Server 2022. This will improve backups and speed up recovery times. It will also make analytics, big data, and AI solutions easier to develop, test, and implement. Given that big data and analytics data are increasing at a rate much faster than structured data, this is significant in helping enterprises execute better on their analytics and AI initiatives. Let’s take a step back to find out how we arrived at this point around big data.

Sign up for email

In 2004, just after the term “big data” was recognized as an official term, Apache created Hadoop’s big data storage layer called the Hadoop Distributed File System (HDFS). In 1998, John Mashey, chief scientist at Silicon Graphics International (SGI), gave a presentation entitled “Big Data … and the Next Wave of InfraStress” at a USENIX meeting. Mashey used the term “big data” in various speeches at that conference and has, therefore, been credited with coining the term. Given that Hadoop debuted around the same time, the terms became synonymous.

As this new scientific approach to data gained momentum, Hadoop and HDFS quickly dominated the market and the attention of big data wranglers. A language and ecosystem with fun names like Pig and Hive even emerged in the space. This approach worked well for batch processing and big data, but it was too difficult to combine these data types with the relational day-to-day data in SQL Server. The engineers at Microsoft years earlier had seen this problem coming and had already been working on a solution. What if the data wasn’t stored in SQL Server, but read from where the unstructured data was and presented to SQL Server?

This integration complexity drove Microsoft to introduce SQL Server Big Data Clusters based on HDFS as a way to integrate unstructured data directly with structured data on the fly without large amounts of data movement. The result—DBAs could now provide unstructured data to be queried and combined in SQL Server directly from more sources using the tools that they used every day.

Fast forward a few years from that time. Data growth didn’t slow down, and while the types of data are varied, the uses were constantly changing. This continued growth meant that the batch processing that made Hadoop robust wasn’t meeting business requirements fast enough. New players started entering the game.

In 2006, Amazon introduced Simple Storage Service (S3) and it became a disruptive player. S3 has some advantages over HDFS that appeal to data engineers, including scalability, durability, and persistence. It has since grown in popularity and market share. And Microsoft took notice.

Microsoft is staying true to its core values of meeting its customers where they are and delivering the best tools to meet the challenges of tomorrow, today. This past week at Ignite 2021, Microsoft announced that it will include S3 connectivity in SQL Server 2022. You can read more about it in a post by Bob Ward, a principal architect at Microsoft. He states:

“We have new extensions to the T-SQL language to support data virtualization and backup/restore with S3 compatible storage systems.“ – Bob Ward, principal architect at Microsoft

As big data clusters and SQL Server change, so do the connections needed to access data, both structured and unstructured, to ensure optimal data insights with the least effort. At first glance, this may look like a great opportunity solely for big data clusters and connectivity, but a connection to S3 also opens the door to fast backup and rapid restore capabilities.

This is where it gets nuanced. Just as my daughter who is a software engineering student will tell you that Microsoft Visual Studio Code isn’t the same as SQL Server Management Studio (SSMS), many developers would also argue that quality S3 isn’t the same as HDFS. HDFS is a storage platform you run and administer for yourself. This means that you’re responsible for handling, scaling, and dealing with node failures as part of the HDFS experience. With S3 and a unified fast file and object (UFFO) platform like Pure Storage® FlashBlade®, those problems are eliminated, leading to a richer experience.

The future of data is expanding exponentially. Having additional persistent storage options that mitigate ransomware attacks with built-in safeguards like SafeMode™ snapshots is the added security and disaster recovery needed to move forward confidently. As we continue to grow solutions with both relational and unstructured data, the option of S3 streamlines the flow of data from SQL Server 2022 to AI by reducing the connection friction. It’s exciting to see Microsoft continue to expand the SQL Server ecosystem, and Pure Storage will continue to build and offer value-added integrations and storage platforms to maximize the SQL Server data experience.

Learn about how Pure FlashBlade, the industry’s leading unified fast file and object (UFFO) storage platform, can back up your SQL Server and meet your most demanding unstructured data needs.