When discretising a PDE using most variants of the spectral element method, the resulting problem is usually expressed as a matrix equation. In traditional linear finite element codes, the matrix is usually represented as a large sparse global matrix, which represents the action of a particular operator such as the Laplacian matrix across the whole domain.

However, when we consider spectral element methods, in which the polynomial order representing the expansion can be far higher, this method becomes far less optimal. We can instead consider the action of an operator locally on each element, and then perform an assembly operation. This is mathematically equivalent to the global matrix approach and gives exactly the same answer, but at high polynomial orders it is far more efficient on modern CPU architectures.

Furthermore, this local approach can be represented in one of two ways: either as
a dense matrix for each element, which is typically more efficient at intermediate
polynomial orders, or in the hp element case as a tensor product of smaller dense
matrices via an approach deemed *sum-factorisation*, which is used at very high
polynomial orders. Figure ?? gives an overview of these three different operator
strategies.

A goal of Nektar++ is to support not only high order expansions, but *all* orders from low
(where element size h is the dominant factor) to high (where p dominates); a procedure we
have dubbed “from h to p efficiently”.

An obvious question is: “which strategy should I select?” The most important factors in this decision are:

- what the operator is;
- polynomial order p;
- element type and dimension of the problem;
- underlying hardware architecture;
- the number of operator calls in the solver;
- BLAS implementation speed.

Generally you can use results from three publications [44, 7, 6] which outline results for two- and three-dimensional elements.

In general, the best approach is to perform some preliminary timings by changing the appropriate variables in the session file, which is outlined below. As a very rough guide, for 1 ≤ p ≤ 2 you should use the global approach; for 3 ≤ p ≤ 7 use the local approach; and for p ≥ 8 use sum-factorisation. However, these guidelines will vary due to the parameters noted above. In future releases of Nektar++ we hope to tune these variables automatically to make this decision easier to make.

Operator evaluation strategies can be configured in the `GLOBALOPTIMISATIONPARAMETERS`

tag,
which lies inside the root `NEKTAR`

tag:

1<NEKTAR>

2 <GLOBALOPTIMIZATIONPARAMETERS>

3 <BwdTrans>

4 <DO_GLOBAL_MAT_OP VALUE="0" />

5 <DO_BLOCK_MAT_OP TRI="1" QUAD="1" TET="1"

6 PYR="1" PRISM="1" HEX="1" />

7 </BwdTrans>

8 <IProductWRTBase>

9 <DO_GLOBAL_MAT_OP VALUE="0" />

10 <DO_BLOCK_MAT_OP TRI="1" QUAD="1" TET="1"

11 PYR="1" PRISM="1" HEX="1" />

12 </IProductWRTBase>

13 <HelmholtzMatrixOp>

14 <DO_GLOBAL_MAT_OP VALUE="0" />

15 <DO_BLOCK_MAT_OP TRI="1" QUAD="1" TET="1"

16 PYR="1" PRISM="1" HEX="1" />

17 </HelmholtzMatrixOp>

18 <MassMatrixOp>

19 <DO_GLOBAL_MAT_OP VALUE="0" />

20 <DO_BLOCK_MAT_OP TRI="1" QUAD="1" TET="1"

21 PYR="1" PRISM="1" HEX="1" />

22 </MassMatrixOp>

23 </GLOBALOPTIMIZATIONPARAMETERS>

24</NEKTAR>

2 <GLOBALOPTIMIZATIONPARAMETERS>

3 <BwdTrans>

4 <DO_GLOBAL_MAT_OP VALUE="0" />

5 <DO_BLOCK_MAT_OP TRI="1" QUAD="1" TET="1"

6 PYR="1" PRISM="1" HEX="1" />

7 </BwdTrans>

8 <IProductWRTBase>

9 <DO_GLOBAL_MAT_OP VALUE="0" />

10 <DO_BLOCK_MAT_OP TRI="1" QUAD="1" TET="1"

11 PYR="1" PRISM="1" HEX="1" />

12 </IProductWRTBase>

13 <HelmholtzMatrixOp>

14 <DO_GLOBAL_MAT_OP VALUE="0" />

15 <DO_BLOCK_MAT_OP TRI="1" QUAD="1" TET="1"

16 PYR="1" PRISM="1" HEX="1" />

17 </HelmholtzMatrixOp>

18 <MassMatrixOp>

19 <DO_GLOBAL_MAT_OP VALUE="0" />

20 <DO_BLOCK_MAT_OP TRI="1" QUAD="1" TET="1"

21 PYR="1" PRISM="1" HEX="1" />

22 </MassMatrixOp>

23 </GLOBALOPTIMIZATIONPARAMETERS>

24</NEKTAR>

Operator evaluation is supported for four operators: backward transform, inner product, Helmholtz and mass operators. It is possible to specify the following optimisation flags for different operators:

`DO_GLOBAL_MAT_OP`

: If`VALUE`

is`1`

, the globally assembled system matrix will be used to evaluate the operator. If`VALUE`

is`0`

, the operator will be evaluated elementally.`DO_BLOCK_MAT_OP`

: If`VALUE`

is`1`

, the elemental evaluation will be done using the elemental/local matrices (which are all concatenated in a block matrix, hence the name). If`VALUE`

is`0`

, the elemental evaluation will be done using the sum-factorisation technique.Each element type (triangle, quadrilateral, etc) has its own

`VALUE`

, since break-even points for sum-factorisation and the local matrix approach will differ depending on element type. Note that due to a small shortcoming in the code, all element types must be defined; so three-dimensional elements must still be defined even if the simulation is two-dimensional.

Note that global takes precendence over block, so if `VALUE`

is set to `1`

for both then the
operator will be global.

For very complex operators – in particular `HelmholtzMatrixOp`

– always set `DO_BLOCK_MAT_OP`

to
`1`

as sum-factorisation for these operator types can be costly.