This blog on Trino S3 initially appeared on Medium. It was republished with the author’s credit and consent.
In this blog, I’ll go over how to use S3 storage on a Pure Storage® FlashBlade® with Trino, the fast distributed SQL query engine for big data.
I deploy Trino using the hive chart and provide a values.yaml file with the following configuration:
[crayon-68116b752436b529336745/]
This is pointing to my hive-metastore server. See this blog post for more information on setting that up. I then edit the Trino service to switch from ClusterIP to NodePort to facilitate external access.
As usual, I use the helm install command:
[crayon-68116b7524377744573757/]
On a Linux client with the trino-cli installed, I use the following command to connect to my in Kubernetes running instance, and list the current catalogs available:
[crayon-68116b752437a835210667/]
I can then select my hive source and check the available tables:
[crayon-68116b752437e170218640/]
Note that to see the available schemas, you can use:
[crayon-68116b7524381139693414/]
I can now run various queries on the data set. Please note this is from a very limited lab Kubernetes cluster with low resources and network connectivity, so performance was not the aim:
[crayon-68116b7524383576498694/]
Written By: