This post is the first of a two-part series focusing on Pure’s storage solutions for SQL Server 2019 big data clusters. Each post can be read independently or as a whole.
Microsoft SQL Server has served the IT community admirably when it comes to the processing of high-value relational data. However, since the development of SQL Server, times have moved on. The growth of data has been relentless, as has the growth in interest in data science. Historically, SQL Server has never been a scale-out analytics platform for processing both the read and write elements of workloads. Many of the tools favored by data scientists are more at home on Linux than Windows. SQL Server 2017 enabled SQL Server to go places it has never gone before by allowing it to run on Linux. This move opened a new world of open-source software and Linux based data science tools to SQL Server. Because Linux is more of a first-class citizen in the world of containers than Windows, SQL Server’s availability to run on Linux also broadened its horizons for containerized workloads.
A container engine in isolation can run on a laptop, server, PC, or the public cloud. However, container engines in isolation lack:
- Resilience – i.e., the ability to re-spin up a container if it falls over for any reason,
- The ability to scale out a workload horizontally,
- The ability to schedule a containerized workload across multiple machines,
- Storage orchestration,
- Service discovery,
- . . . and much much more.
The solution to these shortcomings is a container orchestration platform, and at the present time, Kubernetes is the near industry standard for container orchestration. These threads culminated in Microsoft releasing SQL Server 2019 big data clusters, a scale-out platform that runs on Kubernetes for processing both high value relational and unstructured data.
The Role of Storage
High-value data requires storage that is reliable, durable, highly available, secure, and consistent in terms of performance. Given that by design, one of Kubernetes’ primary aims is to abstract infrastructure away from users of the platform, how is storage-as-a-service delivered in this brave new world?
Enter Pure Service Orchestrator
Pure’s storage platforms are trusted by numerous organizations when it comes to processing their most mission-critical SQL Server workloads. But how is this storage consumed in the new world of containers, enter Pure Service Orchestrator™!; Pure’s storage plugin for containerized workloads:
Three key areas set Pure Service Orchestrator apart from a plugin that only provides persistence for containers:
- Smart Provisioning
PSO automatically makes the best provisioning decision for each storage request – in real-time – by assessing multiple factors such as performance load, the capacity, and health of your arrays, and policy tags.
- Elastic Scaling
Uniting all your Pure FlashArray™ and FlashBlade™ arrays on a single shared infrastructure, and supporting file and block as needed, PSO makes adding new arrays effortless, so you can scale as your environment grows.
- Transparent Recovery
To ensure your services stay robust, PSO self-heals – so you’re protected against data corruption caused by issues such as node failure, array performance limits, and low disk space.
Installing and Configuration Pure Service Orchestrator
Once a Kubernetes or OpenShift Container Platform cluster is up and running, Pure Service Orchestrator is installed and configured by using the following three simple steps:
1) Clone the Pure Storage GitHub repo that contains Pure Service Orchestrator:
2) Specify the details of the arrays used by Pure Service Orchestrator in a values.yaml file. A values.yaml file template can be found here. Below is an example of the contents of a values.yaml file:
3) Run the install script to set up the PSO-operator.
install.sh –image=<image> –namespace=<namespace> \
–orchestrator=<ochestrator> -f <values.yaml>
- image is the Pure Storage Flex Operator image. If unspecified image resolves to the released version at quay.io/purestorage/pso-operator.
- namespace is the namespace/project in which the Pure Flex Operator and its entities will be installed. If unspecified, the operator creates and installs in the pso-operator namespace. Pure Flex Operator MUST be installed in a new project with no other pods. Otherwise an uninstall may delete pods that are not related to the Pure Flex operator.
- orchestrator should be either k8s or openshift depending on which orchestrator is being used. If unspecified, k8s is assumed.
- values.yaml is the customized helm-chart configuration parameters. This is a required parameter and must contain the list of all backend FlashArray and FlashBlade storage appliances. All parameters that need a non-default value must be specified in this file. Refer to Configuration for values.yaml.
Configure the Storage for The Big Data Cluster
SQL Server 2019 big data cluster storage is specified in a ‘Configuration.’ The following instructions assume:
- A Kubernetes cluster is up and running,
- Pure Service Orchestrator is installed and correctly configured,
- azdata is already installed,
- The environment variables specified the SQL Server 2019 big data cluster Deployment Guidance documentation are set (this is correct as of release candidate 1).
Pure has always provided a storage experience that the SQL Server community loves. A trend that will continue for SQL Server 2019 big data clusters with a genuine storage-as-a-service experience on Kubernetes that provides elastic scaling, fault tolerance and smart provisioning.