15.1 Collections and MatrixFree operations

The Collections and associated MatrixFree libraries adds optimisations to perform certain elemental operations collectively by applying an operator using either matrix-matrix or unrolled matrix free operations, rather than a sequence of matrix-vector multiplications. Certain operators benefit more than other from this treatment, so the following implementations are available:

All configuration relating to Collections is given in the COLLECTIONS Xml element within the NEKTAR XML element.

15.1.1 Automatic tuning and the –write-opt-file command line option

By default we now try to select the optimal choice of implementation when you first run a solver. If you run the solver in verbose mode you will observe an output of the form:

1Collection Implementation for Tetrahedron ( 4 4 4 ) for ngeoms = 428 
2            Op.    :    opt. Impl.       (IterLocExp,  IterStdExp,  StdMat,     SumFac,      MatrixFree) 
3         BwdTrans:      MatFree          (0.000344303, 0.000336822, 0.000340444, 0.000185503, 6.80494e-05) 
4         Helmholtz:     MatFree          (0.00227906, 0.00481378,      --    ,      --    , 0.000374155) 
5         IPWrtBase:     MatFree          (0.000364424, 0.000318054, 0.000291705, 0.000138584, 8.37257e-05) 
6         IPWrtDBase:    MatFree          (0.00378674, 0.00308545, 0.00100464, 0.000653242, 0.000283372) 
7         PhysDeriv :    MatFree          (0.000881537, 0.000774604, 0.00407994, 0.000540257, 0.000185529) 
8Collection Implemenation for Prism ( 4 4 4 ) for ngeoms = 136 
9            Op.    :    opt. Impl.       (IterLocExp,  IterStdExp,  StdMat,     SumFac,      MatrixFree) 
10         BwdTrans:      MatFree          (0.000131559, 0.000130099, 0.000237854, 8.40501e-05, 2.78436e-05) 
11         Helmholtz:     MatFree          (0.000988519, 0.00133484,      --    ,      --    , 0.000166906) 
12         IPWrtBase:     MatFree          (0.000113946, 0.000105544, 0.00022007, 5.74802e-05, 3.18842e-05) 
13         IPWrtDBase:    MatFree          (0.00148209, 0.000717362, 0.000885148, 0.000257414, 0.00011241) 
14         PhysDeriv :    MatFree          (0.000295485, 0.000247841, 0.00186362, 0.000219107, 7.38712e-05)

This shows the selected collection operation, in this case MatrixFree, for the different operators implmentations and the various approaches. Note that IterLocExp is equivalent to NoCollection and IterStdExp is directly related to the IterPerExp option.

This choice of optimisation is then written into a file called Session.opt where Session is name of the user defined xml file. We note that the optimal choice is currently based on the volumetric elements of the mesh (i.e. Tris and Quads in 2D and Tets, Pyramids, Prisms and Hexs in 3D) and not on the boundary conditions. In the case of a parallel run the root process will write the file based on the optimisation on this processor. In the case one type of element is not on the root processor the output form the highest rank process with this element shape will be outputted. Once this file is present it will be read directly rather than re-running the auto-tuning. Even if an opt file already exist you can force a new one to be generated by using the --write-opt-file command line option.

15.1.2 Manually selecting the COLLECTIONS section

The COLLECTIONS section can be set manually within the COLLECTIONS tag as shown in the following example. Note this section can be added in either the input Session.xml file or the Session.opt file that is auto-generated.

Different implementations may be chosen for different element shapes and expansion orders. Specifying * for ORDER sets the default implementation for any expansion orders not explicitly defined.

1<COLLECTIONS> 
2    <OPERATOR TYPE="BwdTrans"> 
3        <ELEMENT TYPE="T" ORDER="*"   IMPTYPE="IterPerExp" /> 
4        <ELEMENT TYPE="T" ORDER="1-5" IMPTYPE="StdMat" /> 
5    </OPERATOR> 
6    <OPERATOR TYPE="IProductWRTBase"> 
7        <ELEMENT TYPE="Q" ORDER="*"   IMPTYPE="SumFac" /> 
8    </OPERATOR> 
9</COLLECTIONS>

15.1.2.1 Default implementation

The default implementation for all operators may be chosen through setting the DEFAULT attribute of the COLLECTIONS XML element to one of StdMat, SumFac, IterPerExp, NoCollection or Matrixfree. The StdMat sets up a standard matrix for the element in the collection as the underlying operator. The following uses the collated matrix-matrix type elemental operation for all operators and expansion orders:

1<COLLECTIONS DEFAULT="StdMat" />

The NoCollection option iterates over each expansion in the local region calling the local operator which is implemented in a sum factorization method within the element. The IterPerExp holds a standard expansion and then also holds an expanded copy of the geometric factors within the collection operator. SumFac is a sum factorization implementation which undertakes each direction of the method over multiple elements in the collection. Finally MatrixFree implements a vectorisation suitable version of the sum factorisation which has minimal memory movement but requires some initial data re-orientation when vectorising over multiple elements.

15.1.2.2 Auto-tuning

The choice of implementation for each operator, for the given mesh and expansion orders, can be selected selected automatically through an attribute in the COLLECITON section. To enable this, add the following to the Nektar++ session file:

1<COLLECTIONS DEFAULT="auto" />

This will collate elements from the given mesh and given expansion orders, run and time each implementation strategy in turn, and select the fastest performing case. Note that the selections will be mesh- and order- specific. The selections made via auto-tuning are output if the –verbose command-line switch is given.

15.1.3 Collection size

The maximum number of elements within a single collection can be enforced using the MAXSIZE attribute.