OpenFOAM, the open source CFD toolbox, is renowned for its robust parallel
communication using domain decomposition. In this article we review the parallel
communication in OpenFOAM and describe developments by OpenCFD that will
be available in its next release.
LAM and OpenMPI OpenFOAM is currently shipped with one public domain MPI implementation,
LAM, which has proven to give good performance and be extremely stable. The
next release of OpenFOAM will additionally be shipped with another public
domain MPI implementation, OpenMPI. OpenMPI is the amalgamation of three
separate MPI projects including LAM. Compared to LAM, OpenMPI is more
configurable, has more supported interconnects and automatic usage of all TCP
networks.
GAMMA In addition, the next release of OpenFOAM will include a completely new
Pstream implementation which uses the Genoa Active Message MAchine
(GAMMA) communication library. GAMMA is a low-latency replacement for
TCP/IP on gigabit and is supported for Intel platforms on modern Linux kernels
(both 32 and 64 bit). It completely bypasses the Linux network stack to produce
record breaking latency figures.
The current OpenFOAM release can be configured to run with the MPI
compatibility layer on top of GAMMA (MPI-GAMMA), and has been extensively
tested on OpenCFD’s 16 processor cluster. The lower latency affects especially
those cases where the number of processors is large and the number of cells and
processor faces is low. In real terms, using GAMMA instead of LAM or OpenMPI
will give improved run times ranging from a few percent for large cases
running on a small number of processors to a few hundred percent for small
cases on a large number of processors (where the other MPI libraries
can actually cause a decrease in run times compared to a non-parallel
run).
For the next release of OpenFOAM, OpenCFD have been working closely with
GAMMA developer Giuseppe Ciaccio to implement a direct GAMMA
driver that bypasses the MPI layer. Because of the nature of the GAMMA
protocol, the MPI layer causes a small overhead and bypassing it gives
some speed-up especially in transmission of small messages. Taking the
example of an unrealistically small testcase - 1000 cells per processor, 16
processors - we have seen a 30% improvement in run time. For more realistic
test cases the benefits will be smaller. Apart from potential performance
improvement the direct GAMMA driver also has dynamic-receive buffer sizing,
removing the need to adapt the $MPI_BUFFER_SIZE environment variable
for cases with extreme number of processor faces. We have found the
direct GAMMA driver and MPI-GAMMA to be absolutely stable during
running.
The improvement in communication speed of GAMMA over other MPI
libraries is currently offset somewhat by: (1) problems relating to startup and
shutdown of jobs; and, (2) the installation of the GAMMA library being difficult,
requiring additional hardware in the form of a dedicated gigabit connection and
switch and requiring patching of the linux kernel. However GAMMA is
actively maintained and documentation is available for all steps of the
installation.
Overall, working with GAMMA has been a very interesting experience for the
OpenFOAM developers at OpenCFD. We feel there is a definite need for a public
domain low-latency protocol on commodity hardware and GAMMA is by
far the best candidate. It has been shown to be extremely applicable
to the typical communications pattern of a domain decomposed CFD
code.
We support the project and are reaping the benefit of faster communications
on our own Linux cluster. We invite other OpenFOAM users running on a cluster
to try GAMMA themselves and provide feedback of their experience to the
GAMMA project.