14.1 Operator evaluation strategies

When discretising a PDE using most variants of the spectral element method, the resulting problem is usually expressed as a matrix equation. In traditional linear finite element codes, the matrix is usually represented as a large sparse global matrix, which represents the action of a particular operator such as the Laplacian matrix across the whole domain.

However, when we consider spectral element methods, in which the polynomial order representing the expansion can be far higher, this method becomes far less optimal. We can instead consider the action of an operator locally on each element, and then perform an assembly operation. This is mathematically equivalent to the global matrix approach and gives exactly the same answer, but at high polynomial orders it is far more efficient on modern CPU architectures.

Furthermore, this local approach can be represented in one of two ways: either as a dense matrix for each element, which is typically more efficient at intermediate polynomial orders, or in the hp element case as a tensor product of smaller dense matrices via an approach deemed sum-factorisation, which is used at very high polynomial orders. Figure ?? gives an overview of these three different operator strategies.

A goal of Nektar++ is to support not only high order expansions, but all orders from low (where element size h is the dominant factor) to high (where p dominates); a procedure we have dubbed “from h to p efficiently”.

14.1.1 Selecting an operator strategy

An obvious question is: “which strategy should I select?” The most important factors in this decision are:

  1. what the operator is;
  2. polynomial order p;
  3. element type and dimension of the problem;
  4. underlying hardware architecture;
  5. the number of operator calls in the solver;
  6. BLAS implementation speed.

Generally you can use results from three publications [3365] which outline results for two- and three-dimensional elements.

In general, the best approach is to perform some preliminary timings by changing the appropriate variables in the session file, which is outlined below. As a very rough guide, for 1 ≤ p ≤ 2 you should use the global approach; for 3 ≤ p ≤ 7 use the local approach; and for p ≥ 8 use sum-factorisation. However, these guidelines will vary due to the parameters noted above. In future releases of Nektar++ we hope to tune these variables automatically to make this decision easier to make.

14.1.2 XML syntax

Operator evaluation strategies can be configured in the GLOBALOPTIMISATIONPARAMETERS tag, which lies inside the root NEKTAR tag:

3    <BwdTrans> 
4      <DO_GLOBAL_MAT_OP VALUE="0" /> 
5      <DO_BLOCK_MAT_OP TRI="1" QUAD="1"  TET="1" 
6                       PYR="1" PRISM="1" HEX="1" /> 
7    </BwdTrans> 
8    <IProductWRTBase> 
9      <DO_GLOBAL_MAT_OP VALUE="0" /> 
10      <DO_BLOCK_MAT_OP TRI="1" QUAD="1"  TET="1" 
11                       PYR="1" PRISM="1" HEX="1" /> 
12    </IProductWRTBase> 
13    <HelmholtzMatrixOp> 
14      <DO_GLOBAL_MAT_OP VALUE="0" /> 
15      <DO_BLOCK_MAT_OP TRI="1" QUAD="1"  TET="1" 
16                       PYR="1" PRISM="1" HEX="1" /> 
17    </HelmholtzMatrixOp> 
18    <MassMatrixOp> 
19      <DO_GLOBAL_MAT_OP VALUE="0" /> 
20      <DO_BLOCK_MAT_OP TRI="1" QUAD="1"  TET="1" 
21                       PYR="1" PRISM="1" HEX="1" /> 
22    </MassMatrixOp> 

14.1.3 Selecting different operator strategies

Operator evaluation is supported for four operators: backward transform, inner product, Helmholtz and mass operators. It is possible to specify the following optimisation flags for different operators:

  1. DO_GLOBAL_MAT_OP: If VALUE is 1, the globally assembled system matrix will be used to evaluate the operator. If VALUE is 0, the operator will be evaluated elementally.
  2. DO_BLOCK_MAT_OP: If VALUE is 1, the elemental evaluation will be done using the elemental/local matrices (which are all concatenated in a block matrix, hence the name). If VALUE is 0, the elemental evaluation will be done using the sum-factorisation technique.

    Each element type (triangle, quadrilateral, etc) has its own VALUE, since break-even points for sum-factorisation and the local matrix approach will differ depending on element type. Note that due to a small shortcoming in the code, all element types must be defined; so three-dimensional elements must still be defined even if the simulation is two-dimensional.

Note that global takes precendence over block, so if VALUE is set to 1 for both then the operator will be global.

For very complex operators – in particular HelmholtzMatrixOp – always set DO_BLOCK_MAT_OP to 1 as sum-factorisation for these operator types can be costly.