University of Cambridge’s Cumulus Supercomputer

Cumulus’ unique data accelerator helps give the university’s latest supercomputer extremely fast I/O.

At a Glance:

  • The University of Cambridge’s mission is to “contribute to society through the pursuit of education, learning and research at the highest international levels of excellence.”

  • According to the Virtual Institute for I/O*, the Cumulus supercomputer now has the fastest I/O on the planet using Intel® Xeon® Scalable processors and Cornelis Networks products, giving researchers the tools to address the most difficult data-driven problems in the world.

author-image

By

Executive Summary

One of the most well-recognized names in the world, the University of Cambridge’s mission is to “contribute to society through the pursuit of education, learning and research at the highest international levels of excellence.” The university’s Research Computing Services provides High Performance Computing (HPC) resources to support work conducted in its many colleges of science and across industry in the U.K. To leverage today’s data-driven world, Research Comput­ing Services deployed its latest supercomputer, a 2.27 petaFLOPS system called Cumulus.1 Built to address the I/O challenges of large data in simulation and artificial intelligence (AI), Cumulus uses a unique Data Accelerator (DAC) designed into the network topology built around Cornelis Networks products2. Cumulus’ DAC earned Cambridge the top performance rating on the latest I/O-500 list from the Virtual Institute for I/O.3

Challenge

The University of Cambridge’s Research Computing Services supports all of its colleges and aids research in industry in the U.K. Their users have engaged AI as it has moved into the mainstream of science and industry. University of Cambridge researchers are applying AI across a wide range of physical, material, and social sciences where large data can reveal new insights. However, while the compute resources of supercomputers have advanced at the speed of Moore’s Law, large data has introduced I/O challenges for HPC systems used for simulation and AI. Even the fastest networks become bottlenecks, impacting Research Computing Services users’ time-to-solution as massive amounts of data are moved around the storage and compute clusters.

Planning on their next-generation supercomputer acquisition for large-data simulation and AI workloads, advancing I/O performance along with petaFLOPS-scale compute capability was a critical part of Research Computing Services’ new system design. Working with Dell EMC, Intel, and StackHPC, developers created an innovative data acceleration solution, which was built into a new 50,176-core, 2.27 petaFLOPS supercomputer, named Cumulus.1

Solution

Cumulus is built on Dell EMC PowerEdge* servers with Intel® Xeon® Gold 6142F processors, which include an integrated Cornelis Networks host fabric adapter in each processor, and a few nodes with Intel® Xeon Phi™ pro­cessor cards. Cornelis Networks switches and cables form the fabric for the compute cluster and DAC, which are connected to a Lustre* parallel file system storage cluster. Besides being a key HPC resource for large data workloads, to efficiently support the needs of users, several of Cumulus’ nodes host an OpenStack* cloud to automate system resource partitioning. With over 2 pet­aFLOPS of performance and ranking among the top 100 supercomputers in the world,1 the Cumulus-UK Science Cloud is the fastest academic supercomputer in the country.

Unique to Cumulus is the 24-node Data Accelerator (DAC), a cluster of storage nodes designed specifically to enhance access to large data on the Lustre file system. DAC leverages the performance of Dell EMC PowerEdge servers, the NVM Express* interface, Intel Xeon Scalable processors, Intel® SSD DC P4600 Series drives, and software co-developed with Cambridge University. It delivers more than 500 GB/s of read and nearly 353 kIOP/s of overall performance. With a score of 620.69 on the IO-500 list, Cumulus is the fastest I/O supercomputer in the world.4

DAC leverages the SLURM (Simple Linux* Utility for Resource Management) workload manager’s burst buffer plugin, the performance of the Lustre and BeeGFS* file systems, and Lustre’s Distributed Name Space (DNE) feature to deliver the high performance that researchers’ workloads demand. An orchestrator, designed and developed by Research Comput­ing Services and StackHPC, was added to simplify building and managing DAC, and to enable continued experimenta­tion and development of DAC for other workflows.

Result

Large data projects require fast I/O to and from storage. According the Virtual Institute for I/O, Cumulus now has the fastest I/O on the planet, using the institute’s IO-500 benchmark. Similar to the Top500.org’s list of the 500 fastest supercomputers in the world, the IO-500 ranks HPC systems for storage performance. Cumulus, with its data accelerator, earned the #1 position with a score of 620.69, nearly double that of the #2 system. That gives the Cumulus-UK Science Cloud the tools to address the needs of researchers working on the most difficult data-driven problems in the world.

“We brought together on a single system high levels of compute and I/O with Hadoop and machine learning frame­works in an OpenStack* environment, giving us customiz­ability and security for our users,” stated Dr. Paul Calleja, Director of Research Computing Services. “Combining those capabilities, this machine can be used to deliver data-centric research to new and emerging communities.”

According to Dr. Calleja, many AI projects are already under way with Cambridge scientists involved in medical imaging analysis, genomics, and astronomy.

The Square Kilometre Array (SKA) project, for example, collects data from an advanced global radio telescope that is 100 times more sensitive than earlier generation radio telescopes. It will allow scientists to survey the sky up to 1 million times faster.5

The data the project will gather requires 100+ petatFLOPS-scale computing power—only available in the very largest supercomputers today. Cumulus and the DAC are currently being used to model and prototype next-gener­ation HPC systems needed to process data from the SKA.

Research Computing Services is also supporting ground­breaking work in genomics with the UK 10K project. This project, which started in 2010, uses Hadoop* and massive amounts of sequenced genomic data from 10,000 people to help researchers understand the relationships between very infrequent and rare genetic changes and human disease caused by disruptive changes to the proteins made by the human body.

“These are the kinds of challenges that are best conquered with the processing power of the world’s leading Top500 and IO-500 supercomputers—like the Cumulus-UK Science Cloud,” added Calleja.

Solution Summary

To support the many large-data simulation and AI workloads of University of Cambridge researchers, the university’s Research Computing Services built Cumulus, a 2.27 peta- FLOPS supercomputer. Cumulus runs traditional workloads and provides an OpenStack cloud for easy configuration for users’ projects. Unique to the Cumulus-UK Science Cloud, is DAC, a specialized data accelerator that helps make Cumulus the fastest I/O supercomputer (I/O-500) in the world. Cumulus and DAC are helping accelerate insight and research into many scientific research projects, including the Square Kilometre Array and the UK 10K.

Solution Ingredients

  • Dell EMC PowerEdge* compute node servers (50,176 cores)
  • Intel Xeon Gold 6142F with integrated Cornelis Networks host adapter
  • Intel Xeon Phi 7210 (208 nodes)
  • 24 Dell EMC PowerEdge R740xd Data Accelerator (DAC) nodes
  • Cornelis Networks switches

Download the PDF ›

Product and Performance Information

2

Intel has spun out the Omni-Path business to Cornelis Networks, an independent Intel Capital portfolio company. Cornelis Networks will continue to serve and sell to existing and new customers by delivering leading purpose-built high-performance network products for high performance computing and artificial intelligence. Intel believes Cornelis Networks will expand the ecosystem of high-performance fabric solutions, offering options to customers building clusters for HPC and AI based on Intel® Xeon™ processors. Additional details on the divestiture and transition of Omni-Path products can be found at www.cornelisnetworks.com.

5University of Cambridge Department of Physics, Cavendish Laboratory, Square Kilometre Array (SKA) research.