Data loads are growing exponentially, and data centers — both commercial and research-based — are struggling to scale up without having to invest in the massive amounts of additional power and cooling required for all those additional servers. For this reason, the massively parallel architectures of Field Programmable Gate Arrays (FPGAs) have become attractive alternatives to power-hungry CPUs.
One very successful example of an FPGA-based cluster was built by the Center for High Performance Reconfigurable Computing (CHREC), a national research center funded by the National Science Foundation. The Center’s goal is to investigate, develop, evaluate, and showcase the most powerful Reconfigurable Computing (RC) machine ever fielded for research. Scalable RC systems are uniquely capable of both high performance and low energy, cooling, and TCO.
2x4x4 Torus (can be expanded further)
Dr. Alan George, Center Director at CHREC, was looking to partner with a leading board or module vendor in this pursuit. At first his team of researchers wanted to focus on CPU socket accelerators, but they found that technologies of this type were quite quirky, unstable, expensive, immature, and lacking in promised performance (i.e., I/O throughput was slower than regular PCI). Thus, they started investigating the best in new PCI-Express cards.
CHREC researchers evaluated many of the leading FPGA module and board technologies in their labs, including products from many of Gidel’s competitors. After comparing performance vs. cost, ease and stability of use, and technical support, CHREC researchers chose Gidel to build their largest FPGA-based supercomputer, called Novo-G.
CHREC researchers had some primary emphases:
Novo-G was built using several Gidel FPGA cards: PS III/PSIV/ProceV D8 (400+ FPGAs). The unique thing about this FPGA-based cluster is how several of Gidel’s PCIe cards were connected to provide direct FPGA-FPGA computation (interconnected by a specialized high-speed, three-dimensional torus network). The cluster represented a savings of thousands of times lower cost and size, and required thousands of times less power and cooling than conventional high-end supercomputers.
The Novo-G is the largest and most powerful RC-based research supercomputer in the world, winning the 2012 Alexander Schwarzkopf prize for technology innovation. It is capable of achieving nearly double the performance of Anton and fifty times that of established clusters like BlueGene/L for the 3D FFT kernel!