Spark revolutionized large-scale data processing.  The value it provides includes:

  • 100x faster than Hadoop MapReduce
  • Enabling applications to be written in Java, Scala, Python, or R
  • Combine SQL, steaming, and complex analytics to be run on the same stack.
  • Run it standalone, on Hadoop and Mesos with data scores from HDFS, Casandra, HBase, and S3

FlashBlade® is a true cloud-scale, big-data storage platform that provides consumers with a platform to handle the concurrency required to accelerate typical big data processing workloads.  These include:

  • Simple and easy to deploy and manage
  • Provides NFSv3 and S3/Object
  • Centralized cloud mangement via Pure1
  • 17GB/s at 1,000,000 operations per second
  • Consistent linear scaling through metadata hyper-partitioning across blades
  • Scale from 98TB to 1.6PB in 4RU with scalability to 10s of PB with multi-chassis configurations (coming soon)
  • Linear scaling per blade without disruption or downtime
  • Low latency software define networking

FlashBlade and SparkPairing FlashBlade and Spark

  • Ultra low consistent latency for all queries
  • Consolidate data across multiple Hadoop clusters to leverage all compute on all capacity
  • Ability to scale compute and storage separately
  • 6x faster reporting queries
  • 3x faster deep analytics queries
  • 2x faster interactive queries
  • Use any orchestration/file format (Mesos, Kubernetes, Parquet, Hadoop Yarn, and more)
  • Rest API enabling integration into any tool or custom script  (coming soon)

If you are running Spark, you should check out our whitepaper on FlashBlade and Spark.