The ability to build code at scale is extremely important for software and hardware development teams today. Increasingly sophisticated pre-commit and full regression systems are driving this need. Teams are implementing these systems to guarantee code quality and ensure that developer code commits don’t “break” the full build. Tools like Jenkins are constantly initiating full or incremental builds.

As a result, it’s vital that infrastructure can keep up with parallel computation and storage demands. And their usage and scale are rising. It’s not uncommon for pre-commit smoke tests and mini-regressions to scale to thousands of runs, with full regressions orders of magnitude larger.

Almost all organizations build some kind of software. Build and regression performance is a challenge across all verticals. In particular, organizations involved in electronic design automation (EDA), high performance computing (HPC), autonomous driving, mobile communications, aerospace, automotive, gaming, manufacturing, business software, and operating system design all have enormous codebases. The scale of their challenges is ever-growing.

To accommodate this rapidly increasing scale, organizations are exploring the use of Microsoft Azure in a connected-cloud configuration. Pure Storage®, Azure, and Equinix have collaborated to develop a connected-cloud architecture for EDA and highly parallel HPC workloads, such as software build and testing.

In this architecture, Pure Storage FlashBlade®, a high-performance file and object storage platform, is installed in an Equinix data center with a low latency ExpressRoute connection to an Azure region. There are many advantages to this particular architecture. Read more about them in the blog post, Connected Cloud with FlashBlade and Microsoft Azure HPC for EDA Workloads.

Parallel Build Experiment

 We conducted a performance test of this architecture. All tests were done in the Western United States utilizing the US-West-2 Azure region. The ExpressRoute was measured to have approximately 2ms round-trip latency in this configuration.

Figure 1: Test configuration using an ExpressRoute connection between an Equinix data center and US-West-2 Azure region.

In order to test the performance of FlashBlade and Azure compute over ExpressRoute for software builds, we set up an experiment to execute multiple parallel builds of the Linux kernel and observe the execution times. Incidentally, this test is an excellent benchmark for testing storage performance for small file and metadata-dominant workloads that are common in software development, EDA/semiconductor chip design, artificial intelligence/deep learning, animation/CGI/rendering, and genomics and fintech workloads.

For this test, 15 instances of E64dsv4 were provisioned. The Azure E64dsv4 VM is configured with 64 Intel Xeon® Platinum 8272CL (Cascade Lake) CPUs, 504GiB memory, and 30Mbps network bandwidth. The operating system is CentOS 7.8 with the latest patch and NFS utilities installed.

We ran two different build experiments:

  1. Two threads per build, with eight builds per instance, scaling up to a total of 120 parallel builds (eight builds/instance * 15 instances)
  2. Eight threads per build, with three builds per instance, scaling up to a total of 45 parallel builds (three builds/instance * 15 instances). Three build types were run, each with a different architecture target (i86, x86, and ARM)

In all cases, caches were dropped on the clients before every run to minimize that effect since FlashBlade itself doesn’t cache data.

Results and Findings

Both build experiments showed exceptional results. The ideal situation in all cases of a parallel software build is that storage saturation doesn’t cause execution time to go up as load increases. This is exactly what we observed.

For the two threads per build, the results are depicted in Figure 2 below. It clearly shows that from 8 to 120 builds, there was practically no impact to build run time.

Figure 2:  Average Build Time(s) vs. Number of Builds (two threads/build)

During this test, the FlashBlade dashboard showed a very linear response. A few other observations can be made based on the dashboard data:

  • The workload is IOPS intensive. The IOPS ramp up to approximately 135K IOPS and are very linear.
  • Latency is consistently sub 2ms.
  • Bandwidth peaks are ~45MB/s.

Given these data points, we can extrapolate that the FlashBlade storage array could support well over 1,200 parallel Linux kernel builds without any runtime degradation before the ExpressRoute bandwidth limit would start to impact performance.

Figure 3: FlashBlade IOPS, Bandwidth, and Latency results during the build test.

Figures 3 and 4 show that the eight-thread build experiment yielded very similar results, with build times practically unchanged as the number of parallel builds were scaled.

Figure 4: Average Linux Kernel Build Time (eight threads per build) vs. Builds

 Performance at Scale

The connected-cloud architecture for software builds using FlashBlade for the data layer has proven to deliver excellent performance at scale. We project this scaling behavior can extend at least 10x beyond what we’ve tested up to 1,200 builds across 150 instances. After this,  we would encounter performance degradation due to bandwidth limitations of the 10Gb/s ExpressRoute link. If it were possible to upgrade that link, we would be able to scale even further.

For organizations interested in running software or silicon build and regression pipelines, this architecture would provide significant benefits and could scale to large instance counts to securely accelerate product delivery, improve quality, and increase the productivity of development teams. The connected-cloud architecture would also simplify migration and accelerate cloud adoption for teams looking to utilize Azure public-cloud resources.

The next post in this series covers data mobility for HPC and EDA workloads from on-premises to Azure Cloud.