SQL Server Distributed Availability Groups and Kubernetes

Summary

This article shows how to seed a database from a SQL instance that is in a Windows availability group into a SQL instance running in a pod in a Kubernetes cluster using a distributed availability group.

This article on SQL Server distributed availability groups originally appeared on Andrew Pruski’s blog. It has been republished with the author’s credit and consent.

A while back, I wrote about how to use a cross-platform (or clusterless) availability group to seed a database from a Windows SQL instance into a pod in Kubernetes.

I was talking with a colleague last week, and they asked, “What if the existing Windows instance is already in an availability group?”

This is a fair question, as it’s fairly rare (in my experience) to run a standalone SQL instance in production…most instances are in some form of HA setup, be it a failover cluster instance or an availability group.

Failover cluster instances will work with a clusterless availability group, but it’s a different story when it comes to existing availability groups.

A Linux node cannot be added to an existing Windows availability group (trust me, I tried for longer than I’m going to admit), so the only way to do it is to use a distributed availability group.

So let’s run through the process!

Here’s the existing Windows availability group:

Just a standard, two-node AG with one database already synchronized across the nodes. It’s that database we’re going to seed over to the pod running on the Kubernetes cluster using a distributed availability group.

So here’s the Kubernetes cluster:

kubectl get nodes

1	kubectl get nodes

Four nodes, one control plane node, and three worker nodes.

OK, so the first thing to do is deploy a statefulset running one SQL Server pod (using a file called sqlserver-statefulset.yaml):

kubectl apply -f .\sqlserver-statefulset.yaml

1	kubectl apply –f .\sqlserver–statefulset.yaml

Here’s the manifest of the statefulset:

1  apiVersion: apps/v1
2  kind: StatefulSet
3  metadata:
4    name: mssql-statefulset
5  spec:
6    serviceName: “mssql”
7    replicas: 1
8    podManagementPolicy: Parallel
9    selector:
10     matchLabels:
11       name: mssql-pod
12 template:
13   metadata:
14     labels:
15       name: mssql-pod
16     annotations:
17       stork.libopenstorage.org/disableHyperconvergence: “true”
18   spec:
19     securityContext:
20       fsGroup: 10001
21     hostAliases:
22     – ip: “10.225.115.129”
23       hostnames:
24       – “z-ap-sql-10”
25     containers:
26       – name: mssql-container
27         image: mcr.microsoft.com/mssql/server:2022-CU15-ubuntu-20.04
28         ports:
29           – containerPort: 1433
30             name: mssql-port
31         env:
32           – name: MSSQL_PID
33             value: “Developer”
34           – name: ACCEPT_EULA
35             value: “Y”
36           – name: MSSQL_AGENT_ENABLED
37             value: “1”
38           – name: MSSQL_ENABLE_HADR
39             value: “1”
40           – name: MSSQL_SA_PASSWORD
41             value: “Testing1122”
42         volumeMounts:
43           – name: sqlsystem
44             mountPath: /var/opt/mssql
45           – name: sqldata
46             mountPath: /var/opt/sqlserver/data
47 volumeClaimTemplates:
48   – metadata:
49       name: sqlsystem
50     spec:
51       accessModes:
52        – ReadWriteOnce
53       resources:
54         requests:
55           storage: 1Gi
56       storageClassName: mssql-sc
57   – metadata:
58       name: sqldata
59     spec:
60       accessModes:
61        – ReadWriteOnce
62       resources:
63         requests:
64           storage: 25Gi
65       storageClassName: mssql-sc

1 apiVersion: apps/v1

2 kind: StatefulSet

3 metadata:

4 name: mssql–statefulset

5 spec:

6 serviceName: “mssql”

7 replicas: 1

8 podManagementPolicy: Parallel

9 selector:

10 matchLabels:

11 name: mssql–pod

12 template:

13 metadata:

14 labels:

15 name: mssql–pod

16 annotations:

17 stork.libopenstorage.org/disableHyperconvergence: “true”

18 spec:

19 securityContext:

20 fsGroup: 10001

21 hostAliases:

22 – ip: “10.225.115.129”

23 hostnames:

24 – “z-ap-sql-10”

25 containers:

26 – name: mssql–container

27 image: mcr.microsoft.com/mssql/server:2022–CU15–ubuntu–20.04

28 ports:

29 – containerPort: 1433

30 name: mssql–port

31 env:

32 – name: MSSQL_PID

33 value: “Developer”

34 – name: ACCEPT_EULA

35 value: “Y”

36 – name: MSSQL_AGENT_ENABLED

37 value: “1”

38 – name: MSSQL_ENABLE_HADR

39 value: “1”

40 – name: MSSQL_SA_PASSWORD

41 value: “Testing1122”

42 volumeMounts:

43 – name: sqlsystem

44 mountPath: /var/opt/mssql

45 – name: sqldata

46 mountPath: /var/opt/sqlserver/data

47 volumeClaimTemplates:

48 – metadata:

49 name: sqlsystem

50 spec:

51 accessModes:

52 – ReadWriteOnce

53 resources:

54 requests:

55 storage: 1Gi

56 storageClassName: mssql–sc

57 – metadata:

58 name: sqldata

59 spec:

60 accessModes:

61 – ReadWriteOnce

62 resources:

63 requests:

64 storage: 25Gi

65 storageClassName: mssql–sc

Like my last post, this is pretty stripped down. No resources limits, tolerations, etc. It has two persistent volumes: one for the system databases and one for the user databases from a storage class already configured in the cluster.

One thing to note:

1  hostAliases:
2  – ip: “10.225.115.129”
3    hostnames:
4    – “z-ap-sql-10”

1 hostAliases:

2 – ip: “10.225.115.129”

3 hostnames:

4 – “z-ap-sql-10”

Here, an entry in the pod’s hosts file is being created for the listener of the Windows availability group.

The next thing to do is deploy two services: one so that we can connect to the SQL instance (on port 1433) and one for the AG (port 5022):

kubectl apply -f .\sqlserver-services.yaml

1	kubectl apply –f .\sqlserver–services.yaml

Here’s the manifest for the services:

1  apiVersion: v1
2  kind: Service
3  metadata:
4    name: mssql-service
5  spec:
6    ports:
7      – name: mssql-ports
8        port: 1433
9        targetPort: 1433
10   selector:
11     name: mssql-pod
12   type: LoadBalancer
13 —
14 apiVersion: v1
15 kind: Service
16 metadata:
17   name: mssql-ha-service
18 spec:
19   ports:
20     – name: mssql-ha-ports
21       port: 5022
22       targetPort: 5022
23   selector:
24     name: mssql-pod
25   type: LoadBalancer

1 apiVersion: v1

2 kind: Service

3 metadata:

4 name: mssql–service

5 spec:

6 ports:

7 – name: mssql–ports

8 port: 1433

9 targetPort: 1433

10 selector:

11 name: mssql–pod

12 type: LoadBalancer

13 —–

14 apiVersion: v1

15 kind: Service

16 metadata:

17 name: mssql–ha–service

18 spec:

19 ports:

20 – name: mssql–ha–ports

21 port: 5022

22 targetPort: 5022

23 selector:

24 name: mssql–pod

25 type: LoadBalancer

Note: We could use just one service with multiple ports configured, but I’m keeping them separate here to try and keep things as clear as possible.

Using T-SQL Snapshot Backup: Point-in-time Recovery

Check that everything looks OK:

kubectl get all

1	kubectl get all

Now, we need to create the master key, login, and user in all instances:

1  CREATE MASTER KEY ENCRYPTION BY PASSWORD = ‘<C0m9L3xP@55w0rd!>’;
2  CREATE LOGIN [dbm_login] WITH PASSWORD = ‘<C0m9L3xP@55w0rd!>’;
3  CREATE USER dbm_user FOR LOGIN dbm_login;

1 CREATE MASTER KEY ENCRYPTION BY PASSWORD = ‘<C0m9L3xP@55w0rd!>’;

2 CREATE LOGIN [dbm_login] WITH PASSWORD = ‘<C0m9L3xP@55w0rd!>’;

3 CREATE USER dbm_user FOR LOGIN dbm_login;

Then, create a certificate in the SQL instance in the pod:

CREATE CERTIFICATE dbm_certificate WITH SUBJECT = ‘Mirroring_certificate’, EXPIRY_DATE = ‘20301031’

1	CREATE CERTIFICATE dbm_certificate WITH SUBJECT = ‘Mirroring_certificate’, EXPIRY_DATE = ‘20301031’

Back up that certificate:

1  BACKUP CERTIFICATE dbm_certificate
2  TO FILE = ‘/var/opt/mssql/data/dbm_certificate.cer’
3  WITH PRIVATE KEY (
4        FILE = ‘/var/opt/mssql/data/dbm_certificate.pvk’,
5        ENCRYPTION BY PASSWORD = ‘<C0m9L3xP@55w0rd!>’
6  );

1 BACKUP CERTIFICATE dbm_certificate

2 TO FILE = ‘/var/opt/mssql/data/dbm_certificate.cer’

3 WITH PRIVATE KEY (

4 FILE = ‘/var/opt/mssql/data/dbm_certificate.pvk’,

5 ENCRYPTION BY PASSWORD = ‘<C0m9L3xP@55w0rd!>’

6 );

Copy the certificate locally:

1  kubectl cp mssql-statefulset-0:var/opt/mssql/data/dbm_certificate.cer ./dbm_certificate.cer -n prod
2  kubectl cp mssql-statefulset-0:var/opt/mssql/data/dbm_certificate.pvk ./dbm_certificate.pvk -n prod

1 2	1 kubectl cp mssql–statefulset–0:var/opt/mssql/data/dbm_certificate.cer ./dbm_certificate.cer –n prod 2 kubectl cp mssql–statefulset–0:var/opt/mssql/data/dbm_certificate.pvk ./dbm_certificate.pvk –n prod

And then copy the files to the Windows boxes:

1  Copy-Item dbm_certificate.cer \\z-ap-sql-02\E$\SQLBackup1\ -Force
2  Copy-Item dbm_certificate.pvk \\z-ap-sql-02\E$\SQLBackup1\ -Force
3  Copy-Item dbm_certificate.cer \\z-ap-sql-03\E$\SQLBackup1\ -Force
4  Copy-Item dbm_certificate.pvk \\z-ap-sql-03\E$\SQLBackup1\ -Force

1 Copy–Item dbm_certificate.cer \\z–ap–sql–02\E$\SQLBackup1\ –Force

2 Copy–Item dbm_certificate.pvk \\z–ap–sql–02\E$\SQLBackup1\ –Force

3 Copy–Item dbm_certificate.cer \\z–ap–sql–03\E$\SQLBackup1\ –Force

4 Copy–Item dbm_certificate.pvk \\z–ap–sql–03\E$\SQLBackup1\ –Force

Once the files are on the Windows boxes, we can create the certificate in each Windows SQL instance:

1  CREATE CERTIFICATE dbm_certificate  
2    AUTHORIZATION dbm_user  
3    FROM FILE = ‘E:\SQLBackup1\dbm_certificate.cer’  
4    WITH PRIVATE KEY (  
5      FILE = ‘E:\SQLBackup1\dbm_certificate.pvk’,  
6      DECRYPTION BY PASSWORD = ”  
7    );

1 CREATE CERTIFICATE dbm_certificate

2 AUTHORIZATION dbm_user

3 FROM FILE = ‘E:\SQLBackup1\dbm_certificate.cer’

4 WITH PRIVATE KEY (

5 FILE = ‘E:\SQLBackup1\dbm_certificate.pvk’,

6 DECRYPTION BY PASSWORD = ”

7 );

OK, great! Now we need to create a mirroring endpoint in the SQL instance in the pod:

1  CREATE ENDPOINT [Hadr_endpoint]  
2    STATE = STARTED  
3    AS TCP (  
4      LISTENER_PORT = 5022,  
5      LISTENER_IP = ALL)  
6    FOR DATA_MIRRORING (  
7      ROLE = ALL,  
8      AUTHENTICATION = WINDOWS CERTIFICATE [dbm_certificate],  
9      ENCRYPTION = REQUIRED ALGORITHM AES  
10   );  
11  ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;  
12  GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [dbm_login];

1 CREATE ENDPOINT [Hadr_endpoint]

2 STATE = STARTED

3 AS TCP (

4 LISTENER_PORT = 5022,

5 LISTENER_IP = ALL)

6 FOR DATA_MIRRORING (

7 ROLE = ALL,

8 AUTHENTICATION = WINDOWS CERTIFICATE [dbm_certificate],

9 ENCRYPTION = REQUIRED ALGORITHM AES

10 );

11 ALTER ENDPOINT [Hadr_endpoint] STATE = STARTED;

12 GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [dbm_login];

There are already endpoints in the Windows instances, but we need to update them to use the certificate for authentication:

1  ALTER ENDPOINT [Hadr_endpoint]  
2    STATE = STARTED  
3    AS TCP (  
4      LISTENER_PORT = 5022,  
5      LISTENER_IP = ALL)  
6    FOR DATABASE_MIRRORING (  
7      AUTHENTICATION = WINDOWS CERTIFICATE [dbm_certificate],  
8      ENCRYPTION = REQUIRED ALGORITHM AES  
9    );  
10  GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [dbm_login];

1 ALTER ENDPOINT [Hadr_endpoint]

2 STATE = STARTED

3 AS TCP (

4 LISTENER_PORT = 5022,

5 LISTENER_IP = ALL)

6 FOR DATABASE_MIRRORING (

7 AUTHENTICATION = WINDOWS CERTIFICATE [dbm_certificate],

8 ENCRYPTION = REQUIRED ALGORITHM AES

9 );

10 GRANT CONNECT ON ENDPOINT::[Hadr_endpoint] TO [dbm_login];

Now, we can create a one-node clusterless availability group in the SQL instance in the pod:

1  CREATE AVAILABILITY GROUP [AG2]  
2    WITH (CLUSTER_TYPE=NONE)  
3    FOR  
4    REPLICA ON  
5      ‘mssql-statefulset-0’ WITH  
6        (  
7          ENDPOINT_URL = ‘TCP://mssql-statefulset-0.com:5022’,  
8          FAILOVER_MODE = MANUAL,  
9          AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,  
10         BACKUP_PRIORITY = 50,  
11         SEEDING_MODE = AUTOMATIC,  
12         SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)  
13       )

1 CREATE AVAILABILITY GROUP [AG2]

2 WITH (CLUSTER_TYPE=NONE)

3 FOR

4 REPLICA ON

5 ‘mssql-statefulset-0’ WITH

6 (

7 ENDPOINT_URL = ‘TCP://mssql-statefulset-0.com:5022’,

8 FAILOVER_MODE = MANUAL,

9 AVAILABILITY_MODE = SYNCHRONOUS_COMMIT,

10 BACKUP_PRIORITY = 50,

11 SEEDING_MODE = AUTOMATIC,

12 SECONDARY_ROLE(ALLOW_CONNECTIONS = NO)

13 )

No listener here; we’re going to use the mssql-ha-service as the endpoint for the distributed availability group.

OK, so on the primary node of the Windows availability group, we can create the distributed availability group:

1  CREATE AVAILABILITY GROUP [DistributedAG]  
2    WITH (DISTRIBUTED)  
3    AVAILABILITY GROUP ON  
4      ‘AG1’ WITH  
5        (  
6          LISTENER_URL = ‘tcp://Z-AP-SQL-10:5022’,  
7          AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,  
8          FAILOVER_MODE = MANUAL,  
9          SEEDING_MODE = AUTOMATIC  
10       ),  
11     ‘AG2’ WITH  
12       (  
13         LISTENER_URL = ‘tcp://10.225.115.131:5022’,  
14         AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,  
15         FAILOVER_MODE = MANUAL,  
16         SEEDING_MODE = AUTOMATIC  
17       );

1 CREATE AVAILABILITY GROUP [DistributedAG]

2 WITH (DISTRIBUTED)

3 AVAILABILITY GROUP ON

4 ‘AG1’ WITH

5 (

6 LISTENER_URL = ‘tcp://Z-AP-SQL-10:5022’,

7 AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,

8 FAILOVER_MODE = MANUAL,

9 SEEDING_MODE = AUTOMATIC

10 ),

11 ‘AG2’ WITH

12 (

13 LISTENER_URL = ‘tcp://10.225.115.131:5022’,

14 AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,

15 FAILOVER_MODE = MANUAL,

16 SEEDING_MODE = AUTOMATIC

17 );

We could use a host file entry for the URL in AG2 (I did that in the previous post), but here, we’ll just use the IP address of the mssql-ha-service.

OK, nearly there! We now have to join the availability group in the SQL instance in the pod:

1  ALTER AVAILABILITY GROUP [DistributedAG]  
2    JOIN  
3    AVAILABILITY GROUP ON  
4      ‘AG1’ WITH  
5        (  
6          LISTENER_URL = ‘tcp://Z-AP-SQL-10:5022’,  
7          AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,  
8          FAILOVER_MODE = MANUAL,  
9          SEEDING_MODE = AUTOMATIC  
10       ),  
11     ‘AG2’ WITH  
12       (  
13         LISTENER_URL = ‘tcp://10.225.115.131:5022’,  
14         AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,  
15         FAILOVER_MODE = MANUAL,  
16         SEEDING_MODE = AUTOMATIC  
17       );

1 ALTER AVAILABILITY GROUP [DistributedAG]

2 JOIN

3 AVAILABILITY GROUP ON

4 ‘AG1’ WITH

5 (

6 LISTENER_URL = ‘tcp://Z-AP-SQL-10:5022’,

7 AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,

8 FAILOVER_MODE = MANUAL,

9 SEEDING_MODE = AUTOMATIC

10 ),

11 ‘AG2’ WITH

12 (

13 LISTENER_URL = ‘tcp://10.225.115.131:5022’,

14 AVAILABILITY_MODE = ASYNCHRONOUS_COMMIT,

15 FAILOVER_MODE = MANUAL,

16 SEEDING_MODE = AUTOMATIC

17 );

And that should be it! If we now connect to the SQL instance in the pod, the database is there!

There it is! OK, one thing I haven’t gone through here is how to get auto-seeding working from Windows into a Linux SQL instance. I went through how that works in my previous post, but the gist is, as long as the database data and log files are located under the Windows SQL instance’s default data and log path, they’ll auto-seed to the Linux SQL instance’s default data and log paths.

So that’s how to seed a database from a SQL instance that is in a Windows availability group into a SQL instance running in a pod in a Kubernetes cluster using a distributed availability group.