Implements communication for populating forward and backwards spaces across processors in the discontinuous Galerkin routines. More...

#include <AssemblyCommDG.h>

Public Member Functions
	~AssemblyCommDG ()=default
	Default destructor. More...

	AssemblyCommDG (const ExpList &locExp, const ExpListSharedPtr &trace, const Array< OneD, Array< OneD, LocalRegions::ExpansionSharedPtr > > &elmtToTrace, const Array< OneD, const ExpListSharedPtr > &bndCondExp, const Array< OneD, const SpatialDomains::BoundaryConditionShPtr > &bndCond, const PeriodicMap &perMap)

void	PerformExchange (const Array< OneD, NekDouble > &testFwd, Array< OneD, NekDouble > &testBwd)
	Perform the trace exchange between processors, given the forwards and backwards spaces. More...

Private Member Functions
void	InitialiseStructure (const ExpList &locExp, const ExpListSharedPtr &trace, const Array< OneD, Array< OneD, LocalRegions::ExpansionSharedPtr > > &elmtToTrace, const Array< OneD, const ExpListSharedPtr > &bndCondExp, const Array< OneD, const SpatialDomains::BoundaryConditionShPtr > &bndCond, const PeriodicMap &perMap, const LibUtilities::CommSharedPtr &comm)
	Initalises the structure for the MPI communication. More...

Static Private Member Functions
static std::tuple< NekDouble, NekDouble, NekDouble >	Timing (const LibUtilities::CommSharedPtr &comm, const int &count, const int &num, const ExchangeMethodSharedPtr &f)
	Timing of the MPI exchange method. More...

Private Attributes
ExchangeMethodSharedPtr	m_exchange
	Chosen exchange method (either fastest parallel or serial) More...

int	m_maxQuad = 0
	Max number of quadrature points in an element. More...

int	m_nRanks = 0
	Number of ranks/processes/partitions. More...

std::map< int, std::vector< int > >	m_rankSharedEdges
	Map of process to shared edge IDs. More...

std::map< int, std::vector< int > >	m_edgeToTrace
	Map of edge ID to quad point trace indices. More...

Detailed Description

Implements communication for populating forward and backwards spaces across processors in the discontinuous Galerkin routines.

The AssemblyCommDG class constructs various exchange methods for performing the action of communicating trace data from the forwards space of one processor to the backwards space of the corresponding neighbour element, and vice versa.

This class initialises the structure for all exchange methods and then times to determine the fastest method for the particular system configuration, if running in serial configuration it assigns the #Serial exchange method. It then acts as a pass through to the chosen exchange method for the PerformExchange function.

Definition at line 241 of file AssemblyCommDG.h.

Constructor & Destructor Documentation

◆ ~AssemblyCommDG()

Nektar::MultiRegions::AssemblyCommDG::~AssemblyCommDG ( )

default

Default destructor.

◆ AssemblyCommDG()

Nektar::MultiRegions::AssemblyCommDG::AssemblyCommDG	(	const ExpList &	locExp,
		const ExpListSharedPtr &	trace,
		const Array< OneD, Array< OneD, LocalRegions::ExpansionSharedPtr > > &	elmtToTrace,
		const Array< OneD, const ExpListSharedPtr > &	bndCondExp,
		const Array< OneD, const SpatialDomains::BoundaryConditionShPtr > &	bndCond,
		const PeriodicMap &	perMap
	)

Definition at line 325 of file AssemblyCommDG.cpp.

{
    auto comm = locExp.GetComm()->GetRowComm();
 
    // If serial then skip initialising graph structure and the MPI timing
    if (comm->IsSerial())
    {
        m_exchange =
            ExchangeMethodSharedPtr(MemoryManager<Serial>::AllocateSharedPtr());
    }
    else
    {
        // Initialise graph structure and link processes across partition
        // boundaries
        AssemblyCommDG::InitialiseStructure(locExp, trace, elmtToTrace,
                                            bndCondExp, bndCond, perMap, comm);
 
        // Timing MPI comm methods, warm up with 10 iterations then time over 50
        std::vector<ExchangeMethodSharedPtr> MPIFuncs;
        std::vector<std::string> MPIFuncsNames;
 
        // Toggle off AllToAll/AllToAllV methods if cores greater than 16 for
        // performance reasons unless override solver info parameter is present
        if (locExp.GetSession()->MatchSolverInfo("OverrideMPI", "ON") ||
            m_nRanks <= 16)
        {
            MPIFuncs.emplace_back(ExchangeMethodSharedPtr(
                MemoryManager<AllToAll>::AllocateSharedPtr(
                    comm, m_maxQuad, m_nRanks, m_rankSharedEdges,
                    m_edgeToTrace)));
            MPIFuncsNames.emplace_back("AllToAll");
 
            MPIFuncs.emplace_back(ExchangeMethodSharedPtr(
                MemoryManager<AllToAllV>::AllocateSharedPtr(
                    comm, m_rankSharedEdges, m_edgeToTrace, m_nRanks)));
            MPIFuncsNames.emplace_back("AllToAllV");
        }
 
        MPIFuncs.emplace_back(
            ExchangeMethodSharedPtr(MemoryManager<Pairwise>::AllocateSharedPtr(
                comm, m_rankSharedEdges, m_edgeToTrace)));
        MPIFuncsNames.emplace_back("PairwiseSendRecv");
 
        // Disable neighbor MPI method on unsupported MPI version (below 3.0)
        if (std::get<0>(comm->GetVersion()) >= 3)
        {
            MPIFuncs.emplace_back(ExchangeMethodSharedPtr(
                MemoryManager<NeighborAllToAllV>::AllocateSharedPtr(
                    comm, m_rankSharedEdges, m_edgeToTrace)));
            MPIFuncsNames.emplace_back("NeighborAllToAllV");
        }
 
        int numPoints = trace->GetNpoints();
        int warmup = 10, iter = 50;
        NekDouble min, max;
        std::vector<NekDouble> avg(MPIFuncs.size(), -1);
        bool verbose = locExp.GetSession()->DefinesCmdLineArgument("verbose");
 
        if (verbose && comm->TreatAsRankZero())
        {
            std::cout << "MPI setup for trace exchange: " << std::endl;
        }
 
        // Padding for output
        int maxStrLen = 0;
        for (size_t i = 0; i < MPIFuncs.size(); ++i)
        {
            maxStrLen = MPIFuncsNames[i].size() > maxStrLen
                            ? MPIFuncsNames[i].size()
                            : maxStrLen;
        }
 
        for (size_t i = 0; i < MPIFuncs.size(); ++i)
        {
            Timing(comm, warmup, numPoints, MPIFuncs[i]);
            std::tie(avg[i], min, max) =
                Timing(comm, iter, numPoints, MPIFuncs[i]);
            if (verbose && comm->TreatAsRankZero())
            {
                std::cout << "  " << MPIFuncsNames[i]
                          << " times (avg, min, max)"
                          << std::string(maxStrLen - MPIFuncsNames[i].size(),
                                         ' ')
                          << ": " << avg[i] << " " << min << " " << max
                          << std::endl;
            }
        }
 
        // Gets the fastest MPI method
        int fastestMPI = std::distance(
            avg.begin(), std::min_element(avg.begin(), avg.end()));
 
        if (verbose && comm->TreatAsRankZero())
        {
            std::cout << "  Chosen fastest method: "
                      << MPIFuncsNames[fastestMPI] << std::endl;
        }
 
        m_exchange = MPIFuncs[fastestMPI];
    }
}

References Nektar::MultiRegions::ExpList::GetComm(), Nektar::MultiRegions::ExpList::GetSession(), InitialiseStructure(), m_edgeToTrace, m_exchange, m_maxQuad, m_nRanks, m_rankSharedEdges, and Timing().

Member Function Documentation

◆ InitialiseStructure()

void Nektar::MultiRegions::AssemblyCommDG::InitialiseStructure	(	const ExpList &	locExp,
		const ExpListSharedPtr &	trace,
		const Array< OneD, Array< OneD, LocalRegions::ExpansionSharedPtr > > &	elmtToTrace,
		const Array< OneD, const ExpListSharedPtr > &	bndCondExp,
		const Array< OneD, const SpatialDomains::BoundaryConditionShPtr > &	bndCond,
		const PeriodicMap &	perMap,
		const LibUtilities::CommSharedPtr &	comm
	)

private

Initalises the structure for the MPI communication.

This function sets up the initial structure to allow for the exchange methods to be created. This structure is contained within the member variable m_rankSharedEdges which is a map of rank to vector of the shared edges with that rank. This is filled by:

Create an edge to trace mapping, and realign periodic edges within this mapping so that they have the same data layout for ranks sharing periodic boundaries. - Create a list of all local edge IDs and calculate the maximum number of quadrature points used locally, then perform an AllReduce to find the maximum number of quadrature points across all ranks (for the AllToAll method). - Create a list of all boundary edge IDs except for those which are periodic - Using the boundary ID list, and all local ID list we can construct a unique list of IDs which are on a partition boundary (e.g. if doesn't occur in the local list twice, and doesn't occur in the boundary list it is on a partition boundary). We also check, if it is a periodic edge, whether the other side is local, if not we add the minimum of thetwo periodic IDs to the unique list as we must have a consistent numbering scheme across ranks. - We send the unique list to all other ranks/partitions. Each ranks unique list is then compared with the local unique edge ID list, if a match is found then the member variable m_rankSharedEdges is filled with the matching rank and unique edge ID.

Definition at line 456 of file AssemblyCommDG.cpp.

{
    Array<OneD, int> tmp;
    int quad = 0, nDim = 0, eid = 0, offset = 0;
    const LocalRegions::ExpansionVector &locExpVector = *(locExp.GetExp());
 
    // Assume that each element of the expansion is of the same
    // dimension.
    nDim = locExpVector[0]->GetShapeDimension();
 
    // This sets up the edge to trace mapping and realigns periodic edges
    if (nDim == 1)
    {
        for (size_t i = 0; i < trace->GetExpSize(); ++i)
        {
            eid    = trace->GetExp(i)->GetGeom()->GetGlobalID();
            offset = trace->GetPhys_Offset(i);
 
            // Check to see if this vert is periodic. If it is, then we
            // need use the unique eid of the two points
            auto it = perMap.find(eid);
            if (perMap.count(eid) > 0)
            {
                PeriodicEntity ent = it->second[0];
                if (!ent.isLocal) // Not sure if true in 1D
                {
                    eid = std::min(eid, ent.id);
                }
            }
 
            m_edgeToTrace[eid].emplace_back(offset);
        }
    }
    else
    {
        for (size_t i = 0; i < trace->GetExpSize(); ++i)
        {
            eid    = trace->GetExp(i)->GetGeom()->GetGlobalID();
            offset = trace->GetPhys_Offset(i);
            quad   = trace->GetExp(i)->GetTotPoints();
 
            // Check to see if this edge is periodic. If it is, then we
            // need to reverse the trace order of one edge only in the
            // edge to trace map so that the data are reversed w.r.t each
            // other. We do this by using the minimum of the two IDs.
            auto it      = perMap.find(eid);
            bool realign = false;
            if (perMap.count(eid) > 0)
            {
                PeriodicEntity ent = it->second[0];
                if (!ent.isLocal)
                {
                    realign = eid == std::min(eid, ent.id);
                    eid     = std::min(eid, ent.id);
                }
            }
 
            for (size_t j = 0; j < quad; ++j)
            {
                m_edgeToTrace[eid].emplace_back(offset + j);
            }
 
            if (realign)
            {
                // Realign some periodic edges in m_edgeToTrace
                Array<OneD, int> tmpArray(m_edgeToTrace[eid].size());
                for (size_t j = 0; j < m_edgeToTrace[eid].size(); ++j)
                {
                    tmpArray[j] = m_edgeToTrace[eid][j];
                }
 
                StdRegions::Orientation orient = it->second[0].orient;
 
                if (nDim == 2)
                {
                    AssemblyMapDG::RealignTraceElement(tmpArray, orient, quad);
                }
                else
                {
                    // Orient is going from face 2 -> face 1 but we want face 1
                    // -> face 2; in all cases except below these are
                    // equivalent. However below is not equivalent so we use the
                    // reverse of the mapping.
                    if (orient == StdRegions::eDir1FwdDir2_Dir2BwdDir1)
                    {
                        orient = StdRegions::eDir1BwdDir2_Dir2FwdDir1;
                    }
                    else if (orient == StdRegions::eDir1BwdDir2_Dir2FwdDir1)
                    {
                        orient = StdRegions::eDir1FwdDir2_Dir2BwdDir1;
                    }
 
                    AssemblyMapDG::RealignTraceElement(
                        tmpArray, orient, trace->GetExp(i)->GetNumPoints(0),
                        trace->GetExp(i)->GetNumPoints(1));
                }
 
                for (size_t j = 0; j < m_edgeToTrace[eid].size(); ++j)
                {
                    m_edgeToTrace[eid][j] = tmpArray[j];
                }
            }
        }
    }
 
    // This creates a list of all geometry of problem dimension - 1
    // and populates the maxQuad member variable for AllToAll
    std::vector<int> localEdgeIds;
    for (eid = 0; eid < locExpVector.size(); ++eid)
    {
        LocalRegions::ExpansionSharedPtr locExpansion = locExpVector[eid];
 
        for (size_t j = 0; j < locExpansion->GetNtraces(); ++j)
        {
            int id = elmtToTrace[eid][j]->GetGeom()->GetGlobalID();
            localEdgeIds.emplace_back(id);
        }
 
        quad = locExpansion->GetTotPoints();
        if (quad > m_maxQuad)
        {
            m_maxQuad = quad;
        }
    }
 
    // Find max quadrature points across all processes for AllToAll method
    comm->AllReduce(m_maxQuad, LibUtilities::ReduceMax);
 
    // Create list of boundary edge IDs
    std::set<int> bndIdList;
    for (size_t i = 0; i < bndCond.size(); ++i)
    {
        // Don't add if periodic boundary type
        if ((bndCond[i]->GetBoundaryConditionType() ==
             SpatialDomains::ePeriodic))
        {
            continue;
        }
        else
        {
            for (size_t j = 0; j < bndCondExp[i]->GetExpSize(); ++j)
            {
                eid = bndCondExp[i]->GetExp(j)->GetGeom()->GetGlobalID();
                bndIdList.insert(eid);
            }
        }
    }
 
    // Get unique edges to send
    std::vector<int> uniqueEdgeIds;
    std::vector<bool> duplicated(localEdgeIds.size(), false);
    for (size_t i = 0; i < localEdgeIds.size(); ++i)
    {
        eid = localEdgeIds[i];
        for (size_t j = i + 1; j < localEdgeIds.size(); ++j)
        {
            if (eid == localEdgeIds[j])
            {
                duplicated[i] = duplicated[j] = true;
            }
        }
 
        if (!duplicated[i]) // Not duplicated in local partition
        {
            if (bndIdList.find(eid) == bndIdList.end()) // Not a boundary edge
            {
                // Check if periodic and if not local set eid to other side
                auto it = perMap.find(eid);
                if (it != perMap.end())
                {
                    if (!it->second[0].isLocal)
                    {
                        uniqueEdgeIds.emplace_back(
                            std::min(eid, it->second[0].id));
                    }
                }
                else
                {
                    uniqueEdgeIds.emplace_back(eid);
                }
            }
        }
    }
 
    // Send uniqueEdgeIds size so all partitions can prepare buffers
    m_nRanks = comm->GetSize();
    Array<OneD, int> rankNumEdges(m_nRanks);
    Array<OneD, int> localEdgeSize(1, uniqueEdgeIds.size());
    comm->AllGather(localEdgeSize, rankNumEdges);
 
    Array<OneD, int> rankLocalEdgeDisp(m_nRanks, 0);
    for (size_t i = 1; i < m_nRanks; ++i)
    {
        rankLocalEdgeDisp[i] = rankLocalEdgeDisp[i - 1] + rankNumEdges[i - 1];
    }
 
    Array<OneD, int> localEdgeIdsArray(uniqueEdgeIds.size());
    for (size_t i = 0; i < uniqueEdgeIds.size(); ++i)
    {
        localEdgeIdsArray[i] = uniqueEdgeIds[i];
    }
 
    // Sort localEdgeIdsArray before sending (this is important!)
    std::sort(localEdgeIdsArray.begin(), localEdgeIdsArray.end());
 
    Array<OneD, int> rankLocalEdgeIds(
        std::accumulate(rankNumEdges.begin(), rankNumEdges.end(), 0), 0);
 
    // Send all unique edge IDs to all partitions
    comm->AllGatherv(localEdgeIdsArray, rankLocalEdgeIds, rankNumEdges,
                     rankLocalEdgeDisp);
 
    // Find what edge Ids match with other ranks
    size_t myRank = comm->GetRank();
    for (size_t i = 0; i < m_nRanks; ++i)
    {
        if (i == myRank)
        {
            continue;
        }
 
        for (size_t j = 0; j < rankNumEdges[i]; ++j)
        {
            int edgeId = rankLocalEdgeIds[rankLocalEdgeDisp[i] + j];
            if (std::find(uniqueEdgeIds.begin(), uniqueEdgeIds.end(), edgeId) !=
                uniqueEdgeIds.end())
            {
                m_rankSharedEdges[i].emplace_back(edgeId);
            }
        }
    }
}

References Nektar::StdRegions::eDir1BwdDir2_Dir2FwdDir1, Nektar::StdRegions::eDir1FwdDir2_Dir2BwdDir1, Nektar::SpatialDomains::ePeriodic, Nektar::StdRegions::find(), Nektar::MultiRegions::ExpList::GetExp(), Nektar::MultiRegions::PeriodicEntity::id, Nektar::MultiRegions::PeriodicEntity::isLocal, m_edgeToTrace, m_maxQuad, m_nRanks, m_rankSharedEdges, Nektar::MultiRegions::AssemblyMapDG::RealignTraceElement(), and Nektar::LibUtilities::ReduceMax.

Referenced by AssemblyCommDG().

◆ PerformExchange()

void Nektar::MultiRegions::AssemblyCommDG::PerformExchange	(	const Array< OneD, NekDouble > &	testFwd,
		Array< OneD, NekDouble > &	testBwd
	)

inline

Perform the trace exchange between processors, given the forwards and backwards spaces.

Parameters

testFwd	Local forwards space of the trace (which will be sent)
testBwd	Local backwards space of the trace (which will receive contributions)

Definition at line 265 of file AssemblyCommDG.h.

    {
        m_exchange->PerformExchange(testFwd, testBwd);
    }

References m_exchange.

◆ Timing()

std::tuple< NekDouble, NekDouble, NekDouble > Nektar::MultiRegions::AssemblyCommDG::Timing	(	const LibUtilities::CommSharedPtr &	comm,
		const int &	count,
		const int &	num,
		const ExchangeMethodSharedPtr &	f
	)

staticprivate

Timing of the MPI exchange method.

Timing of the exchange method f, performing the exchange count times for array of length num.

Parameters

comm	Communicator
count	Number of timing iterations to run
num	Number of quadrature points to communicate
f	#ExchangeMethod to time

Returns: tuple of loop times {avg, min, max}

Definition at line 706 of file AssemblyCommDG.cpp.

{
    Array<OneD, NekDouble> testFwd(num, 1);
    Array<OneD, NekDouble> testBwd(num, -2);
 
    LibUtilities::Timer t;
    t.Start();
    for (size_t i = 0; i < count; ++i)
    {
        f->PerformExchange(testFwd, testBwd);
    }
    t.Stop();
 
    // These can just be 'reduce' but need to setup the wrapper in comm.h
    Array<OneD, NekDouble> minTime(1, t.TimePerTest(count));
    comm->AllReduce(minTime, LibUtilities::ReduceMin);
 
    Array<OneD, NekDouble> maxTime(1, t.TimePerTest(count));
    comm->AllReduce(maxTime, LibUtilities::ReduceMax);
 
    Array<OneD, NekDouble> sumTime(1, t.TimePerTest(count));
    comm->AllReduce(sumTime, LibUtilities::ReduceSum);
 
    NekDouble avgTime = sumTime[0] / comm->GetSize();
    return std::make_tuple(avgTime, minTime[0], maxTime[0]);
}

References Nektar::LibUtilities::ReduceMax, Nektar::LibUtilities::ReduceMin, Nektar::LibUtilities::ReduceSum, Nektar::LibUtilities::Timer::Start(), Nektar::LibUtilities::Timer::Stop(), and Nektar::LibUtilities::Timer::TimePerTest().

Referenced by AssemblyCommDG().

Member Data Documentation

◆ m_edgeToTrace

std::map<int, std::vector<int> > Nektar::MultiRegions::AssemblyCommDG::m_edgeToTrace

private

Map of edge ID to quad point trace indices.

Definition at line 281 of file AssemblyCommDG.h.

Referenced by AssemblyCommDG(), and InitialiseStructure().

◆ m_exchange

ExchangeMethodSharedPtr Nektar::MultiRegions::AssemblyCommDG::m_exchange

private

Chosen exchange method (either fastest parallel or serial)

Definition at line 273 of file AssemblyCommDG.h.

Referenced by AssemblyCommDG(), and PerformExchange().

◆ m_maxQuad

int Nektar::MultiRegions::AssemblyCommDG::m_maxQuad = 0

private

Max number of quadrature points in an element.

Definition at line 275 of file AssemblyCommDG.h.

Referenced by AssemblyCommDG(), and InitialiseStructure().

◆ m_nRanks

int Nektar::MultiRegions::AssemblyCommDG::m_nRanks = 0

private

Number of ranks/processes/partitions.

Definition at line 277 of file AssemblyCommDG.h.

Referenced by AssemblyCommDG(), and InitialiseStructure().

◆ m_rankSharedEdges

std::map<int, std::vector<int> > Nektar::MultiRegions::AssemblyCommDG::m_rankSharedEdges

private

Map of process to shared edge IDs.

Definition at line 279 of file AssemblyCommDG.h.

Referenced by AssemblyCommDG(), and InitialiseStructure().

Public Member Functions

Private Member Functions

Static Private Member Functions

Private Attributes

Detailed Description

Constructor & Destructor Documentation

◆ ~AssemblyCommDG()

◆ AssemblyCommDG()

Member Function Documentation

◆ InitialiseStructure()

◆ PerformExchange()

◆ Timing()

Member Data Documentation

◆ m_edgeToTrace

◆ m_exchange

◆ m_maxQuad

◆ m_nRanks

◆ m_rankSharedEdges