This blog on Trino S3 initially appeared on Medium. It was republished with the author’s credit and consent. 

In this blog, I’ll go over how to use S3 storage on a Pure Storage® FlashBlade® with Trino, the fast distributed SQL query engine for big data.

I deploy Trino using the hive chart and provide a values.yaml file with the following configuration:

This is pointing to my hive-metastore server. See this blog post for more information on setting that up. I then edit the Trino service to switch from ClusterIP to NodePort to facilitate external access.

As usual, I use the helm install command:

On a Linux client with the trino-cli installed, I use the following command to connect to my in Kubernetes running instance, and list the current catalogs available:

I can then select my hive source and check the available tables:

Note that to see the available schemas, you can use:

trino> show schemas from hive;
(2 rows)
Query 20230810_152954_00003_a45jd, FINISHED, 3 nodes
Splits: 68 total, 68 done (100.00%)
0.61 [2 rows, 35B] [3 rows/s, 57B/s]

To see the table schema, you can use:

I can now run various queries on the data set. Please note this is from a very limited lab Kubernetes cluster with low resources and network connectivity, so performance was not the aim: