The Collections and associated MatrixFree libraries adds optimisations to perform certain elemental operations collectively by applying an operator using either matrix-matrix or unrolled matrix free operations, rather than a sequence of matrix-vector multiplications. Certain operators benefit more than other from this treatment, so the following implementations are available:
StdMat: Perform operations using collated matrix-matrix type elemental operation.
SumFac: Perform operation using collated matrix-matrix type sum factorisation (i.e. direction by direction) /operations.
IterPerExp: Loop through elements, performing matrix-vector operation utilising StdRegions building blocks.
MatrixFree: call matrix free implementations that can utilise vectorisation by performing SIMD (single instruction multiple data) operations over multiple elements concurrently.
NoCollections: Use the original LocalRegions implementation to perform the operation which involves looping over the elements which may subsequently call the StdRegions implementations.
All configuration relating to Collections is given in the COLLECTIONS
Xml element within the
NEKTAR
XML element.
By default we now try to select the optimal choice of implementation when you first run a solver. If you run the solver in verbose mode you will observe an output of the form:
1Collection Implementation for Tetrahedron ( 4 4 4 ) for ngeoms = 428 2 Op. : opt. Impl. (IterLocExp, IterStdExp, StdMat, SumFac, MatrixFree) 3 BwdTrans: MatFree (0.000344303, 0.000336822, 0.000340444, 0.000185503, 6.80494e-05) 4 Helmholtz: MatFree (0.00227906, 0.00481378, -- , -- , 0.000374155) 5 IPWrtBase: MatFree (0.000364424, 0.000318054, 0.000291705, 0.000138584, 8.37257e-05) 6 IPWrtDBase: MatFree (0.00378674, 0.00308545, 0.00100464, 0.000653242, 0.000283372) 7 PhysDeriv : MatFree (0.000881537, 0.000774604, 0.00407994, 0.000540257, 0.000185529) 8Collection Implemenation for Prism ( 4 4 4 ) for ngeoms = 136 9 Op. : opt. Impl. (IterLocExp, IterStdExp, StdMat, SumFac, MatrixFree) 10 BwdTrans: MatFree (0.000131559, 0.000130099, 0.000237854, 8.40501e-05, 2.78436e-05) 11 Helmholtz: MatFree (0.000988519, 0.00133484, -- , -- , 0.000166906) 12 IPWrtBase: MatFree (0.000113946, 0.000105544, 0.00022007, 5.74802e-05, 3.18842e-05) 13 IPWrtDBase: MatFree (0.00148209, 0.000717362, 0.000885148, 0.000257414, 0.00011241) 14 PhysDeriv : MatFree (0.000295485, 0.000247841, 0.00186362, 0.000219107, 7.38712e-05)
This shows the selected collection operation, in this case MatrixFree
, for the different
operators implmentations and the various approaches. Note that IterLocExp is equivalent to
NoCollection and IterStdExp is directly related to the IterPerExp option.
This choice of optimisation is then written into a file called Session.opt
where Session
is
name of the user defined xml file. We note that the optimal choice is currently based on the
volumetric elements of the mesh (i.e. Tris and Quads in 2D and Tets, Pyramids, Prisms and
Hexs in 3D) and not on the boundary conditions. In the case of a parallel run the root process
will write the file based on the optimisation on this processor. In the case one type
of element is not on the root processor the output form the highest rank process
with this element shape will be outputted. Once this file is present it will be read
directly rather than re-running the auto-tuning. Even if an opt file already exist you
can force a new one to be generated by using the --write-opt-file
command line
option.
The COLLECTIONS
section can be set manually within the COLLECTIONS
tag as shown in the
following example. Note this section can be added in either the input Session.xml
file or the
Session.opt
file that is auto-generated.
Different implementations may be chosen for different element shapes and expansion orders.
Specifying *
for ORDER
sets the default implementation for any expansion orders not explicitly
defined.
1<COLLECTIONS> 2 <OPERATOR TYPE="BwdTrans"> 3 <ELEMENT TYPE="T" ORDER="*" IMPTYPE="IterPerExp" /> 4 <ELEMENT TYPE="T" ORDER="1-5" IMPTYPE="StdMat" /> 5 </OPERATOR> 6 <OPERATOR TYPE="IProductWRTBase"> 7 <ELEMENT TYPE="Q" ORDER="*" IMPTYPE="SumFac" /> 8 </OPERATOR> 9</COLLECTIONS>
The default implementation for all operators may be chosen through setting the
DEFAULT
attribute of the COLLECTIONS
XML element to one of StdMat
, SumFac
,
IterPerExp
, NoCollection
or Matrixfree
. The StdMat
sets up a standard matrix for
the element in the collection as the underlying operator. The following uses the
collated matrix-matrix type elemental operation for all operators and expansion
orders:
1<COLLECTIONS DEFAULT="StdMat" />
The NoCollection
option iterates over each expansion in the local region calling the local
operator which is implemented in a sum factorization method within the element. The
IterPerExp
holds a standard expansion and then also holds an expanded copy of the
geometric factors within the collection operator. SumFac
is a sum factorization implementation
which undertakes each direction of the method over multiple elements in the collection. Finally
MatrixFree
implements a vectorisation suitable version of the sum factorisation which has
minimal memory movement but requires some initial data re-orientation when vectorising over
multiple elements.
The choice of implementation for each operator, for the given mesh and expansion orders, can
be selected selected automatically through an attribute in the COLLECITON
section. To enable
this, add the following to the Nektar++ session file:
1<COLLECTIONS DEFAULT="auto" />
This will collate elements from the given mesh and given expansion orders, run and time each
implementation strategy in turn, and select the fastest performing case. Note that the
selections will be mesh- and order- specific. The selections made via auto-tuning are output if
the –verbose
command-line switch is given.
The maximum number of elements within a single collection can be enforced using the
MAXSIZE
attribute.