The Collections and associated MatrixFree libraries adds optimisations to perform certain elemental operations collectively by applying an operator using either matrix-matrix or unrolled matrix free operations, rather than a sequence of matrix-vector multiplications. Certain operators benefit more than other from this treatment, so the following implementations are available:
StdMat: Perform operations using collated matrix-matrix type elemental operation.
SumFac: Perform operation using collated matrix-matrix type sum factorisation (i.e. direction by direction) /operations.
IterPerExp: Loop through elements, performing matrix-vector operation utilising StdRegions building blocks.
MatrixFree: call matrix free implementations that can utilise vectorisation by performing SIMD (single instruction multiple data) operations over multiple elements concurrently.
NoCollections: Use the original LocalRegions implementation to perform the operation which involves looping over the elements which may subsequently call the StdRegions implementations.
All configuration relating to Collections is given in the COLLECTIONS Xml element within the
NEKTAR XML element.
By default we now try to select the optimal choice of implementation when you first run a solver. If you run the solver in verbose mode you will observe an output of the form:
1Collection Implementation for Tetrahedron ( 4 4 4 ) for ngeoms = 428 2 Op. : opt. Impl. (IterLocExp, IterStdExp, StdMat, SumFac, MatrixFree) 3 BwdTrans: MatFree (0.000344303, 0.000336822, 0.000340444, 0.000185503, 6.80494e-05) 4 Helmholtz: MatFree (0.00227906, 0.00481378, -- , -- , 0.000374155) 5 IPWrtBase: MatFree (0.000364424, 0.000318054, 0.000291705, 0.000138584, 8.37257e-05) 6 IPWrtDBase: MatFree (0.00378674, 0.00308545, 0.00100464, 0.000653242, 0.000283372) 7 PhysDeriv : MatFree (0.000881537, 0.000774604, 0.00407994, 0.000540257, 0.000185529) 8Collection Implemenation for Prism ( 4 4 4 ) for ngeoms = 136 9 Op. : opt. Impl. (IterLocExp, IterStdExp, StdMat, SumFac, MatrixFree) 10 BwdTrans: MatFree (0.000131559, 0.000130099, 0.000237854, 8.40501e-05, 2.78436e-05) 11 Helmholtz: MatFree (0.000988519, 0.00133484, -- , -- , 0.000166906) 12 IPWrtBase: MatFree (0.000113946, 0.000105544, 0.00022007, 5.74802e-05, 3.18842e-05) 13 IPWrtDBase: MatFree (0.00148209, 0.000717362, 0.000885148, 0.000257414, 0.00011241) 14 PhysDeriv : MatFree (0.000295485, 0.000247841, 0.00186362, 0.000219107, 7.38712e-05)
This shows the selected collection operation, in this case MatrixFree, for the different
operators implmentations and the various approaches. Note that IterLocExp is equivalent to
NoCollection and IterStdExp is directly related to the IterPerExp option.
This choice of optimisation is then written into a file called Session.opt where Session is
name of the user defined xml file. We note that the optimal choice is currently based on the
volumetric elements of the mesh (i.e. Tris and Quads in 2D and Tets, Pyramids, Prisms and
Hexs in 3D) and not on the boundary conditions. In the case of a parallel run the root process
will write the file based on the optimisation on this processor. In the case one type
of element is not on the root processor the output form the highest rank process
with this element shape will be outputted. Once this file is present it will be read
directly rather than re-running the auto-tuning. Even if an opt file already exist you
can force a new one to be generated by using the --write-opt-file command line
option.
The COLLECTIONS section can be set manually within the COLLECTIONS tag as shown in the
following example. Note this section can be added in either the input Session.xml file or the
Session.opt file that is auto-generated.
Different implementations may be chosen for different element shapes and expansion orders.
Specifying * for ORDER sets the default implementation for any expansion orders not explicitly
defined.
1<COLLECTIONS> 2 <OPERATOR TYPE="BwdTrans"> 3 <ELEMENT TYPE="T" ORDER="*" IMPTYPE="IterPerExp" /> 4 <ELEMENT TYPE="T" ORDER="1-5" IMPTYPE="StdMat" /> 5 </OPERATOR> 6 <OPERATOR TYPE="IProductWRTBase"> 7 <ELEMENT TYPE="Q" ORDER="*" IMPTYPE="SumFac" /> 8 </OPERATOR> 9</COLLECTIONS>
The default implementation for all operators may be chosen through setting the
DEFAULT attribute of the COLLECTIONS XML element to one of StdMat, SumFac,
IterPerExp, NoCollection or Matrixfree. The StdMat sets up a standard matrix for
the element in the collection as the underlying operator. The following uses the
collated matrix-matrix type elemental operation for all operators and expansion
orders:
1<COLLECTIONS DEFAULT="StdMat" />
The NoCollection option iterates over each expansion in the local region calling the local
operator which is implemented in a sum factorization method within the element. The
IterPerExp holds a standard expansion and then also holds an expanded copy of the
geometric factors within the collection operator. SumFac is a sum factorization implementation
which undertakes each direction of the method over multiple elements in the collection. Finally
MatrixFree implements a vectorisation suitable version of the sum factorisation which has
minimal memory movement but requires some initial data re-orientation when vectorising over
multiple elements.
The choice of implementation for each operator, for the given mesh and expansion orders, can
be selected selected automatically through an attribute in the COLLECITON section. To enable
this, add the following to the Nektar++ session file:
1<COLLECTIONS DEFAULT="auto" />
This will collate elements from the given mesh and given expansion orders, run and time each
implementation strategy in turn, and select the fastest performing case. Note that the
selections will be mesh- and order- specific. The selections made via auto-tuning are output if
the –verbose command-line switch is given.
The maximum number of elements within a single collection can be enforced using the
MAXSIZE attribute.