Scraping FlashBlade Metrics Using a Prometheus Exporter

Prometheus is an open-source time-series database used for monitoring and alerting. The data is scraped in regular intervals from endpoints using client libraries called exporters that expose the metrics in a Prometheus format. A Prometheus server then collects those metrics via HTTP requests and saves them with timestamps in a database.

Prometheus can be used to collect metrics about the CPU, memory, and network utilization of an AI cluster. The short instructions for adding Prometheus to a Rancher-based Kubernetes cluster can be found here. By default, that Prometheus application will collect metrics from the exporters native to the cluster – such as usage stats for all the Kubernetes pods.

However, you might have additional infrastructure in your AI cluster beyond the compute servers that the Kubernetes pods are running on. For example, you might have shared storage that needs to be monitored as well.

Training and inference pipelines often include a variety of applications like Kafka for message queuing and JupyterHub for exploring models. Having a shared data platform like Pure Storage FlashBlade™ is beneficial for these pipelines because it enables:

easy access to shared datasets
seamless collaboration among data scientists
a cost-effective way to scale capacity and performance as needed

In this blog post, we’ll cover the steps to use our Pure Exporter for FlashBlade so you can have a single pane of glass for monitoring your entire AI cluster.

As shown in the diagram above, the Pure Exporter runs in a Kubernetes pod. Kick off a workload with the Pure Exporter docker image, which can be downloaded here.

Once the Pure Exporter is running as a pod, note the cluster-IP and port values for the pure-exporter. You can find them by running “kubectl get svc”:

$ kubectl get svc
NAME                              TYPE           CLUSTER-IP      EXTERNAL-IP    PORT(S)                      AGE
kubernetes                       ClusterIP      10.43.0.1            <none>                443/TCP                      113d
pure-exporter                   ClusterIP      10.43.255.64      <none>                9491/TCP                     98d
pure-exporter-nodeport    NodePort      10.43.95.229     <none>                9491:31252/TCP          98d

$ kubectl get svc

NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE

kubernetes ClusterIP 10.43.0.1 <none> 443/TCP 113d

pure-exporter ClusterIP 10.43.255.64 <none> 9491/TCP 98d

pure-exporter-nodeport NodePort 10.43.95.229 <none> 9491:31252/TCP 98d

Next, edit the Prometheus server configmap to include a new job to scrape the FlashBlade metrics.

- job_name: 'pure_flashblade'
      scrape_timeout: 30s
      metrics_path: /metrics/flashblade
      relabel_configs:
        - source_labels: [__address__]
          target_label: __param_endpoint
        - source_labels: [__pure_apitoken]
          target_label: __param_apitoken
        - source_labels: [__address__]
          target_label: instance
        - target_label: __address__
          replacement: 10.43.255.64:9491   # cluster-IP:port
      static_configs:
        - targets: [ 10.61.169.20 ]   # eternally addressable FlashBlade IP address
          labels:
            __pure_apitoken: <<your_API_token>>

- job_name: 'pure_flashblade'

scrape_timeout: 30s

metrics_path: /metrics/flashblade

relabel_configs:

- source_labels: [__address__]

target_label: __param_endpoint

- source_labels: [__pure_apitoken]

target_label: __param_apitoken

- source_labels: [__address__]

target_label: instance

- target_label: __address__

replacement: 10.43.255.64:9491 # cluster-IP:port

static_configs:

- targets: [ 10.61.169.20 ] # eternally addressable FlashBlade IP address

labels:

__pure_apitoken: <<your_API_token>>

FlashBlade will be now listed as a target in the Prometheus server and ready to scrape. Prometheus now knows where to find the “/metrics/flashblade” endpoint on the Pure Exporter pod.

So, now the overall flow looks like this:

The Prometheus server discovers the FlashBlade through the Pure Exporter and port 9491 in the Kubernetes cluster. The Pure Exporter gathers the FlashBlade metrics data using RESTful APIs . The collected data is then stored in the Prometheus database.

(In fact, that database is stored in the FlashBlade itself. Storing the monitoring data on FlashBlade provides the ability to scale as AI workloads increase. FlashBlade also provides data reduction of ~2:1 for Prometheus database!)

Once the metrics are stored in the Prometheus database, it is easy to query.

Here, we’re visualizing a metric directly from Prometheus, but it’s easy to pull the data forward into Grafana for beautiful, easy-to-create dashboards. We’ll review Grafana integration into the AI Data Hub in our next post in this series.

To summarize, Pure Exporter allows Prometheus to scrape metrics from Pure Storage data platforms using RESTful APIs.

No additional scripting needed to monitor storage along with the rest of the cluster.
FlashBlade is a great place to save a cluster’s Prometheus database since it provides the simplicity of centralized logs and performant scalability.

The next post will demonstrate how to configure and integrate a FlashBlade metrics dashboard into Grafana for the AI cluster. Visit the blog for our next posts in the series over the coming weeks:

Automating an inference pipeline in a Kubernetes Cluster

Tuning networking configuration of a Kubernetes-based AI Data Hub

Integrating Pure RapidFile Toolkit into Jupyter notebooks

See the previous posts in the series, Providing Data Science Environments with Kubernetes and FlashBlade and Storing a Private Docker Registry on FlashBlade S3.

Scraping FlashBlade Metrics Using a Prometheus Exporter

Introducing the Pure//Launch Blog

Pure Introduces First External Block Storage for Azure VMware Solution

Introducing Pure Protect //DRaaS 1.1

User-tailored Notifications through the Pure Notification Center

Introducing the Pure//Launch Blog

Pure Cloud Block Store Now Available for Microsoft Azure VMware Solution

Pure Introduces First External Block Storage for Azure VMware Solution

Introducing Pure Protect //DRaaS 1.1

User-tailored Notifications through the Pure Notification Center

Scraping FlashBlade Metrics Using a Prometheus Exporter

Related Stories

Top Stories