Dremio S3 and NFS Integration

This article shows how you can use fast NFS and S3 from Pure Storage to power your Dremio Kubernetes deployments.

Dremio

2 minutes
image_pdfimage_print

This blog on Dremio S3 and NFS integration was originally published on Medium. It has been republished with the author’s credit and consent. 

In this blog, I’ll go over how you can use fast NFS and S3 from Pure Storage to power your Dremio Kubernetes deployments.

Dremio Distributed Storage

First, I change the distStorage section in the values.yaml file to reflect my S3 bucket, access and secret keys, as well as the endpoint of the Pure Storage® FlashBlade®:

With the above in place, I deploy to my Dremio namespace using the following helm command:

Once the pods are up and running, I can connect to the webUI on the service port, and after creating the admin account, I’m presented with the Dremio interface:

kubernetes

Let’s take a moment to check what has been created in the S3 bucket I specified for the distStorage:

Official documentation: The distributed storage cache location contains accelerator, tables, job results, downloads, upload data, and scratch data. Within my output the uploads/_staging… objects correspond to the nodes deployed for my test Dremio cluster.

Dremio S3 Source

I then add an S3 source and provide my FlashBlade S3 user access and secret keys, as well as the required additional parameters:

dremio

Note: I unchecked “encrypt connection” on the S3 General page. Also, fs.s3a.path.style.access can be true|false.\

kubernetes

I quickly check the first 10 rows of data with a simple SQL query:

dremio

New objects have been created on the distributed storage bucket:

Modern Hybrid Cloud Solutions

Dremio Metadata Storage

Dremio documentation states that HA Dremio deployments must use NAS for the metadata storage. It also provides guidance on the NAS storage characteristics: low latency, high throughput for concurrent streams as a must-have. This is exactly what Pure Storage FlashBlade sets out to do!

Now the helm chart executor template already assigns a PVC volume for the $DREMIO_HOME/data mount point. In my case, the PVC is being provisioned from FlashBlade NFS storage:

To simulate a shared volume, I change the mountPath line in the template and edit the helm chart values.yaml adding the following additional volume section for the executors:

I check our volume is mounted on our executors:

kubernetes

After running some queries, the new metadata volume shows an increase in used space:

Conclusion

That covers the current three possible Dremio integrations with S3 or NFS storage. As shown, Pure Storage FlashBlade provides the performance and concurrency required with seamless S3 and NFS capabilities to power a Dremio environment.

cloud data
Cloud Data Security Challenges
cloud data

Written By: