This section describes how to run OpenFOAM in parallel on distributed
processors. The method of parallel computing used by OpenFOAM is known as
domain decomposition, in which the geometry and associated fields are broken
into pieces and allocated to separate processors for solution. The process of
parallel computation involves: decomposition of mesh and fields; running the
application in parallel; and, post-processing the decomposed case as described
in the following sections. The parallel running uses the public domain
openMPI implementation of the standard message passing interface (MPI).
OpenFOAM can also be run using the MPICH implementation of MPI which is
described in B.1.
The mesh and fields are decomposed using the decomposePar utility. The
underlying aim is to break up the domain with minimal effort but in such a way
to guarantee a fairly economic solution. The geometry and fields are broken up
according to a set of parameters specified in a dictionary named decomposeParDict
that must be located in the system directory of the case of interest. An example
decomposeParDict dictionary can be copied from the interFoam/damBreak tutorial
if the user requires one; the dictionary entries within it are reproduced
below:
The user has a choice of four methods of decomposition, specified by the method
keyword as described below.
simple
Simple geometric decomposition in which the domain is split into
pieces by direction, e.g. 2 pieces in the direction, 1 in etc.
hierarchical
Hierarchical geometric decomposition which is the same as
simple except the user specifies the order in which the directional split
is done, e.g. first in the -direction, then the -direction etc.
metis
METIS decomposition which requires no geometric input from the
user and attempts to minimise the number of processor boundaries. The
user can specify a weighting for the decomposition between processors
which can be useful on machines with differing performance between
processors.
manual
Manual decomposition, where the user directly specifies the
allocation of each cell to a particular processor.
For each method there are a set of coefficients specified in a sub-dictionary of
decompositionDict, named <method>Coeffs as shown in the dictionary listing. The
full set of keyword entries in the decomposeParDict dictionary are explained in 3.4.
Compulsory entries
numberOfSubdomains
Total number of subdomains
method
Method of decomposition
simple/
hierarchical/
metis/ manual/
simpleCoeffs entries
n
Number of subdomains in , ,
()
delta
Cell skew factor
Typically,
hierarchicalCoeffs entries
n
Number of subdomains in , ,
()
delta
Cell skew factor
Typically,
order
Order of decomposition
xyz/xzy/yxz. . .
metisCoeffsentries
processorWeights
List of weighting factors for allocation
of cells to processors; <wt1> is the
weighting factor for processor 1, etc.;
weights are normalised so can take any
range of values.
(<wt1>...<wtN>)
manualCoeffsentries
dataFile
Name of file containing data of
allocation of cells to processors
"<fileName>"
Distributed data entries (optional) -- see 3.4.3
distributed
Is the data distributed across several
disks?
yes/no
roots
Root paths to case directories; <rt1>
is the root path for node 1, etc.
(<rt1>...<rtN>)
Table 3.4:
Keywords in decompositionDict dictionary.
The decomposePar utility is executed in the normal manner by typing
decomposePar
On completion, a set of subdirectories will have been created, one for each
processor, in the case directory. The directories are named processor where
represents a processor number and contains a time directory,
containing the decomposed field descriptions, and a constant/polyMesh directory
containing the decomposed mesh description.
A decomposed OpenFOAM case is run in parallel using the openMPI
implementation of MPI (openMPI).
openMPI can be run on a local multiprocessor machine very simply but when
running on machines across a network, a file must be created that contains the
host names of the machines. The file can be given any name and located at any
path. In the following description we shall refer to such a file by the generic name,
including full path, <machines>.
The <machines> file contains the names of the machines listed one machine
per line. The names must correspond to a fully resolved hostname in the
/etc/hosts file of the machine on which the openMPI is run. The list must contain
the name of the machine running the openMPI. Where a machine node contains
more than one processor, the node name may be followed by the entry
cpu= where is the number of processors openMPI should run on that
node.
For example, let us imagine a user wishes to run openMPI from machine aaa
on the following machines: aaa; bbb, which has 2 processors; and ccc. The
<machines> would contain:
where: <nProcs> is the number of processors; <foamExec> is the executable,
e.g.icoFoam; and, the output is redirected to a file named log. For example, if
icoFoam is run on 4 nodes, specified in a file named machines, on the cavity
tutorial in the $FOAM_RUN/tutorials/icoFoam directory, then the following
command should be executed:
Data files may need to be distributed if, for example, if only local disks are
used in order to improve performance. In this case, the user may find that
the root path to the case directory may differ between machines. The
paths must then be specified in the decomposeParDict dictionary using
distributed and roots keywords. The distributed entry should read
distributed yes;
and the roots entry is a list of root paths, <root0>, <root1>, . . . , for each node
roots <nRoots> ( "<root0>" "<root1>" ... );
where <nRoots> is the number of roots.
Each of the processor directories should be placed in the case directory at
each of the root paths specified in the decomposeParDict dictionary. The system
directory and files within the constant directory must also be present in each case
directory. Note: the files in the constant directory are needed, but the polyMesh
directory is not.
After a case has been run in parallel, it can be reconstructed for
post-processing. The case is reconstructed by merging the sets of time directories
from each processor directory into a single set of time directories. The
reconstructPar utility performs such a reconstruction by executing the command:
reconstructPar
When the data is distributed across several disks, it must be first copied to the
local case directory for reconstruction.
The user may post-process decomposed cases using the paraFoam post-processor,
described in 6.1. The whole simulation can be post-processed by reconstructing
the case or alternatively it is possible to post-process a segment of the
decomposed domain individually by simply treating the individual processor
directory as a case in its own right.