Nektar++
Public Member Functions | Private Member Functions | Static Private Member Functions | Private Attributes | List of all members
Nektar::MultiRegions::AssemblyCommDG Class Reference

Implements communication for populating forward and backwards spaces across processors in the discontinuous Galerkin routines. More...

#include <AssemblyCommDG.h>

Public Member Functions

 ~AssemblyCommDG ()=default
 Default destructor. More...
 
 AssemblyCommDG (const ExpList &locExp, const ExpListSharedPtr &trace, const Array< OneD, Array< OneD, LocalRegions::ExpansionSharedPtr > > &elmtToTrace, const Array< OneD, const ExpListSharedPtr > &bndCondExp, const Array< OneD, const SpatialDomains::BoundaryConditionShPtr > &bndCond, const PeriodicMap &perMap)
 
void PerformExchange (const Array< OneD, NekDouble > &testFwd, Array< OneD, NekDouble > &testBwd)
 Perform the trace exchange between processors, given the forwards and backwards spaces. More...
 

Private Member Functions

void InitialiseStructure (const ExpList &locExp, const ExpListSharedPtr &trace, const Array< OneD, Array< OneD, LocalRegions::ExpansionSharedPtr > > &elmtToTrace, const Array< OneD, const ExpListSharedPtr > &bndCondExp, const Array< OneD, const SpatialDomains::BoundaryConditionShPtr > &bndCond, const PeriodicMap &perMap, const LibUtilities::CommSharedPtr &comm)
 Initalises the structure for the MPI communication. More...
 

Static Private Member Functions

static std::tuple< NekDouble, NekDouble, NekDoubleTiming (const LibUtilities::CommSharedPtr &comm, const int &count, const int &num, const ExchangeMethodSharedPtr &f)
 Timing of the MPI exchange method. More...
 

Private Attributes

ExchangeMethodSharedPtr m_exchange
 Chosen exchange method (either fastest parallel or serial) More...
 
int m_maxQuad = 0
 Max number of quadrature points in an element. More...
 
int m_nRanks = 0
 Number of ranks/processes/partitions. More...
 
std::map< int, std::vector< int > > m_rankSharedEdges
 Map of process to shared edge IDs. More...
 
std::map< int, std::vector< int > > m_edgeToTrace
 Map of edge ID to quad point trace indices. More...
 

Detailed Description

Implements communication for populating forward and backwards spaces across processors in the discontinuous Galerkin routines.

The AssemblyCommDG class constructs various exchange methods for performing the action of communicating trace data from the forwards space of one processor to the backwards space of the corresponding neighbour element, and vice versa.

This class initialises the structure for all exchange methods and then times to determine the fastest method for the particular system configuration, if running in serial configuration it assigns the #Serial exchange method. It then acts as a pass through to the chosen exchange method for the PerformExchange function.

Definition at line 241 of file AssemblyCommDG.h.

Constructor & Destructor Documentation

◆ ~AssemblyCommDG()

Nektar::MultiRegions::AssemblyCommDG::~AssemblyCommDG ( )
default

Default destructor.

◆ AssemblyCommDG()

Nektar::MultiRegions::AssemblyCommDG::AssemblyCommDG ( const ExpList locExp,
const ExpListSharedPtr trace,
const Array< OneD, Array< OneD, LocalRegions::ExpansionSharedPtr > > &  elmtToTrace,
const Array< OneD, const ExpListSharedPtr > &  bndCondExp,
const Array< OneD, const SpatialDomains::BoundaryConditionShPtr > &  bndCond,
const PeriodicMap perMap 
)

Definition at line 325 of file AssemblyCommDG.cpp.

332{
333 auto comm = locExp.GetComm()->GetRowComm();
334
335 // If serial then skip initialising graph structure and the MPI timing
336 if (comm->IsSerial())
337 {
338 m_exchange =
340 }
341 else
342 {
343 // Initialise graph structure and link processes across partition
344 // boundaries
345 AssemblyCommDG::InitialiseStructure(locExp, trace, elmtToTrace,
346 bndCondExp, bndCond, perMap, comm);
347
348 // Timing MPI comm methods, warm up with 10 iterations then time over 50
349 std::vector<ExchangeMethodSharedPtr> MPIFuncs;
350 std::vector<std::string> MPIFuncsNames;
351
352 // Toggle off AllToAll/AllToAllV methods if cores greater than 16 for
353 // performance reasons unless override solver info parameter is present
354 if (locExp.GetSession()->MatchSolverInfo("OverrideMPI", "ON") ||
355 m_nRanks <= 16)
356 {
357 MPIFuncs.emplace_back(ExchangeMethodSharedPtr(
360 m_edgeToTrace)));
361 MPIFuncsNames.emplace_back("AllToAll");
362
363 MPIFuncs.emplace_back(ExchangeMethodSharedPtr(
366 MPIFuncsNames.emplace_back("AllToAllV");
367 }
368
369 MPIFuncs.emplace_back(
372 MPIFuncsNames.emplace_back("PairwiseSendRecv");
373
374 // Disable neighbor MPI method on unsupported MPI version (below 3.0)
375 if (std::get<0>(comm->GetVersion()) >= 3)
376 {
377 MPIFuncs.emplace_back(ExchangeMethodSharedPtr(
380 MPIFuncsNames.emplace_back("NeighborAllToAllV");
381 }
382
383 int numPoints = trace->GetNpoints();
384 int warmup = 10, iter = 50;
385 NekDouble min, max;
386 std::vector<NekDouble> avg(MPIFuncs.size(), -1);
387 bool verbose = locExp.GetSession()->DefinesCmdLineArgument("verbose");
388
389 if (verbose && comm->TreatAsRankZero())
390 {
391 std::cout << "MPI setup for trace exchange: " << std::endl;
392 }
393
394 // Padding for output
395 int maxStrLen = 0;
396 for (size_t i = 0; i < MPIFuncs.size(); ++i)
397 {
398 maxStrLen = MPIFuncsNames[i].size() > maxStrLen
399 ? MPIFuncsNames[i].size()
400 : maxStrLen;
401 }
402
403 for (size_t i = 0; i < MPIFuncs.size(); ++i)
404 {
405 Timing(comm, warmup, numPoints, MPIFuncs[i]);
406 std::tie(avg[i], min, max) =
407 Timing(comm, iter, numPoints, MPIFuncs[i]);
408 if (verbose && comm->TreatAsRankZero())
409 {
410 std::cout << " " << MPIFuncsNames[i]
411 << " times (avg, min, max)"
412 << std::string(maxStrLen - MPIFuncsNames[i].size(),
413 ' ')
414 << ": " << avg[i] << " " << min << " " << max
415 << std::endl;
416 }
417 }
418
419 // Gets the fastest MPI method
420 int fastestMPI = std::distance(
421 avg.begin(), std::min_element(avg.begin(), avg.end()));
422
423 if (verbose && comm->TreatAsRankZero())
424 {
425 std::cout << " Chosen fastest method: "
426 << MPIFuncsNames[fastestMPI] << std::endl;
427 }
428
429 m_exchange = MPIFuncs[fastestMPI];
430 }
431}
static std::shared_ptr< DataType > AllocateSharedPtr(const Args &...args)
Allocate a shared pointer from the memory pool.
int m_maxQuad
Max number of quadrature points in an element.
std::map< int, std::vector< int > > m_edgeToTrace
Map of edge ID to quad point trace indices.
int m_nRanks
Number of ranks/processes/partitions.
static std::tuple< NekDouble, NekDouble, NekDouble > Timing(const LibUtilities::CommSharedPtr &comm, const int &count, const int &num, const ExchangeMethodSharedPtr &f)
Timing of the MPI exchange method.
void InitialiseStructure(const ExpList &locExp, const ExpListSharedPtr &trace, const Array< OneD, Array< OneD, LocalRegions::ExpansionSharedPtr > > &elmtToTrace, const Array< OneD, const ExpListSharedPtr > &bndCondExp, const Array< OneD, const SpatialDomains::BoundaryConditionShPtr > &bndCond, const PeriodicMap &perMap, const LibUtilities::CommSharedPtr &comm)
Initalises the structure for the MPI communication.
std::map< int, std::vector< int > > m_rankSharedEdges
Map of process to shared edge IDs.
ExchangeMethodSharedPtr m_exchange
Chosen exchange method (either fastest parallel or serial)
std::shared_ptr< ExchangeMethod > ExchangeMethodSharedPtr
double NekDouble

References Nektar::MultiRegions::ExpList::GetComm(), Nektar::MultiRegions::ExpList::GetSession(), InitialiseStructure(), m_edgeToTrace, m_exchange, m_maxQuad, m_nRanks, m_rankSharedEdges, and Timing().

Member Function Documentation

◆ InitialiseStructure()

void Nektar::MultiRegions::AssemblyCommDG::InitialiseStructure ( const ExpList locExp,
const ExpListSharedPtr trace,
const Array< OneD, Array< OneD, LocalRegions::ExpansionSharedPtr > > &  elmtToTrace,
const Array< OneD, const ExpListSharedPtr > &  bndCondExp,
const Array< OneD, const SpatialDomains::BoundaryConditionShPtr > &  bndCond,
const PeriodicMap perMap,
const LibUtilities::CommSharedPtr comm 
)
private

Initalises the structure for the MPI communication.

This function sets up the initial structure to allow for the exchange methods to be created. This structure is contained within the member variable m_rankSharedEdges which is a map of rank to vector of the shared edges with that rank. This is filled by:

  • Create an edge to trace mapping, and realign periodic edges within this mapping so that they have the same data layout for ranks sharing periodic boundaries. - Create a list of all local edge IDs and calculate the maximum number of quadrature points used locally, then perform an AllReduce to find the maximum number of quadrature points across all ranks (for the AllToAll method). - Create a list of all boundary edge IDs except for those which are periodic - Using the boundary ID list, and all local ID list we can construct a unique list of IDs which are on a partition boundary (e.g. if doesn't occur in the local list twice, and doesn't occur in the boundary list it is on a partition boundary). We also check, if it is a periodic edge, whether the other side is local, if not we add the minimum of thetwo periodic IDs to the unique list as we must have a consistent numbering scheme across ranks. - We send the unique list to all other ranks/partitions. Each ranks unique list is then compared with the local unique edge ID list, if a match is found then the member variable m_rankSharedEdges is filled with the matching rank and unique edge ID.

Definition at line 456 of file AssemblyCommDG.cpp.

463{
464 Array<OneD, int> tmp;
465 int quad = 0, nDim = 0, eid = 0, offset = 0;
466 const LocalRegions::ExpansionVector &locExpVector = *(locExp.GetExp());
467
468 // Assume that each element of the expansion is of the same
469 // dimension.
470 nDim = locExpVector[0]->GetShapeDimension();
471
472 // This sets up the edge to trace mapping and realigns periodic edges
473 if (nDim == 1)
474 {
475 for (size_t i = 0; i < trace->GetExpSize(); ++i)
476 {
477 eid = trace->GetExp(i)->GetGeom()->GetGlobalID();
478 offset = trace->GetPhys_Offset(i);
479
480 // Check to see if this vert is periodic. If it is, then we
481 // need use the unique eid of the two points
482 auto it = perMap.find(eid);
483 if (perMap.count(eid) > 0)
484 {
485 PeriodicEntity ent = it->second[0];
486 if (!ent.isLocal) // Not sure if true in 1D
487 {
488 eid = std::min(eid, ent.id);
489 }
490 }
491
492 m_edgeToTrace[eid].emplace_back(offset);
493 }
494 }
495 else
496 {
497 for (size_t i = 0; i < trace->GetExpSize(); ++i)
498 {
499 eid = trace->GetExp(i)->GetGeom()->GetGlobalID();
500 offset = trace->GetPhys_Offset(i);
501 quad = trace->GetExp(i)->GetTotPoints();
502
503 // Check to see if this edge is periodic. If it is, then we
504 // need to reverse the trace order of one edge only in the
505 // edge to trace map so that the data are reversed w.r.t each
506 // other. We do this by using the minimum of the two IDs.
507 auto it = perMap.find(eid);
508 bool realign = false;
509 if (perMap.count(eid) > 0)
510 {
511 PeriodicEntity ent = it->second[0];
512 if (!ent.isLocal)
513 {
514 realign = eid == std::min(eid, ent.id);
515 eid = std::min(eid, ent.id);
516 }
517 }
518
519 for (size_t j = 0; j < quad; ++j)
520 {
521 m_edgeToTrace[eid].emplace_back(offset + j);
522 }
523
524 if (realign)
525 {
526 // Realign some periodic edges in m_edgeToTrace
527 Array<OneD, int> tmpArray(m_edgeToTrace[eid].size());
528 for (size_t j = 0; j < m_edgeToTrace[eid].size(); ++j)
529 {
530 tmpArray[j] = m_edgeToTrace[eid][j];
531 }
532
533 StdRegions::Orientation orient = it->second[0].orient;
534
535 if (nDim == 2)
536 {
537 AssemblyMapDG::RealignTraceElement(tmpArray, orient, quad);
538 }
539 else
540 {
541 // Orient is going from face 2 -> face 1 but we want face 1
542 // -> face 2; in all cases except below these are
543 // equivalent. However below is not equivalent so we use the
544 // reverse of the mapping.
546 {
548 }
549 else if (orient == StdRegions::eDir1BwdDir2_Dir2FwdDir1)
550 {
552 }
553
555 tmpArray, orient, trace->GetExp(i)->GetNumPoints(0),
556 trace->GetExp(i)->GetNumPoints(1));
557 }
558
559 for (size_t j = 0; j < m_edgeToTrace[eid].size(); ++j)
560 {
561 m_edgeToTrace[eid][j] = tmpArray[j];
562 }
563 }
564 }
565 }
566
567 // This creates a list of all geometry of problem dimension - 1
568 // and populates the maxQuad member variable for AllToAll
569 std::vector<int> localEdgeIds;
570 for (eid = 0; eid < locExpVector.size(); ++eid)
571 {
572 LocalRegions::ExpansionSharedPtr locExpansion = locExpVector[eid];
573
574 for (size_t j = 0; j < locExpansion->GetNtraces(); ++j)
575 {
576 int id = elmtToTrace[eid][j]->GetGeom()->GetGlobalID();
577 localEdgeIds.emplace_back(id);
578 }
579
580 quad = locExpansion->GetTotPoints();
581 if (quad > m_maxQuad)
582 {
583 m_maxQuad = quad;
584 }
585 }
586
587 // Find max quadrature points across all processes for AllToAll method
588 comm->AllReduce(m_maxQuad, LibUtilities::ReduceMax);
589
590 // Create list of boundary edge IDs
591 std::set<int> bndIdList;
592 for (size_t i = 0; i < bndCond.size(); ++i)
593 {
594 // Don't add if periodic boundary type
595 if ((bndCond[i]->GetBoundaryConditionType() ==
597 {
598 continue;
599 }
600 else
601 {
602 for (size_t j = 0; j < bndCondExp[i]->GetExpSize(); ++j)
603 {
604 eid = bndCondExp[i]->GetExp(j)->GetGeom()->GetGlobalID();
605 bndIdList.insert(eid);
606 }
607 }
608 }
609
610 // Get unique edges to send
611 std::vector<int> uniqueEdgeIds;
612 std::vector<bool> duplicated(localEdgeIds.size(), false);
613 for (size_t i = 0; i < localEdgeIds.size(); ++i)
614 {
615 eid = localEdgeIds[i];
616 for (size_t j = i + 1; j < localEdgeIds.size(); ++j)
617 {
618 if (eid == localEdgeIds[j])
619 {
620 duplicated[i] = duplicated[j] = true;
621 }
622 }
623
624 if (!duplicated[i]) // Not duplicated in local partition
625 {
626 if (bndIdList.find(eid) == bndIdList.end()) // Not a boundary edge
627 {
628 // Check if periodic and if not local set eid to other side
629 auto it = perMap.find(eid);
630 if (it != perMap.end())
631 {
632 if (!it->second[0].isLocal)
633 {
634 uniqueEdgeIds.emplace_back(
635 std::min(eid, it->second[0].id));
636 }
637 }
638 else
639 {
640 uniqueEdgeIds.emplace_back(eid);
641 }
642 }
643 }
644 }
645
646 // Send uniqueEdgeIds size so all partitions can prepare buffers
647 m_nRanks = comm->GetSize();
648 Array<OneD, int> rankNumEdges(m_nRanks);
649 Array<OneD, int> localEdgeSize(1, uniqueEdgeIds.size());
650 comm->AllGather(localEdgeSize, rankNumEdges);
651
652 Array<OneD, int> rankLocalEdgeDisp(m_nRanks, 0);
653 for (size_t i = 1; i < m_nRanks; ++i)
654 {
655 rankLocalEdgeDisp[i] = rankLocalEdgeDisp[i - 1] + rankNumEdges[i - 1];
656 }
657
658 Array<OneD, int> localEdgeIdsArray(uniqueEdgeIds.size());
659 for (size_t i = 0; i < uniqueEdgeIds.size(); ++i)
660 {
661 localEdgeIdsArray[i] = uniqueEdgeIds[i];
662 }
663
664 // Sort localEdgeIdsArray before sending (this is important!)
665 std::sort(localEdgeIdsArray.begin(), localEdgeIdsArray.end());
666
667 Array<OneD, int> rankLocalEdgeIds(
668 std::accumulate(rankNumEdges.begin(), rankNumEdges.end(), 0), 0);
669
670 // Send all unique edge IDs to all partitions
671 comm->AllGatherv(localEdgeIdsArray, rankLocalEdgeIds, rankNumEdges,
672 rankLocalEdgeDisp);
673
674 // Find what edge Ids match with other ranks
675 size_t myRank = comm->GetRank();
676 for (size_t i = 0; i < m_nRanks; ++i)
677 {
678 if (i == myRank)
679 {
680 continue;
681 }
682
683 for (size_t j = 0; j < rankNumEdges[i]; ++j)
684 {
685 int edgeId = rankLocalEdgeIds[rankLocalEdgeDisp[i] + j];
686 if (std::find(uniqueEdgeIds.begin(), uniqueEdgeIds.end(), edgeId) !=
687 uniqueEdgeIds.end())
688 {
689 m_rankSharedEdges[i].emplace_back(edgeId);
690 }
691 }
692 }
693}
static void RealignTraceElement(Array< OneD, int > &toAlign, StdRegions::Orientation orient, int nquad1, int nquad2=0)
std::shared_ptr< Expansion > ExpansionSharedPtr
Definition: Expansion.h:66
std::vector< ExpansionSharedPtr > ExpansionVector
Definition: Expansion.h:68
InputIterator find(InputIterator first, InputIterator last, InputIterator startingpoint, const EqualityComparable &value)
Definition: StdRegions.hpp:447

References Nektar::StdRegions::eDir1BwdDir2_Dir2FwdDir1, Nektar::StdRegions::eDir1FwdDir2_Dir2BwdDir1, Nektar::SpatialDomains::ePeriodic, Nektar::StdRegions::find(), Nektar::MultiRegions::ExpList::GetExp(), Nektar::MultiRegions::PeriodicEntity::id, Nektar::MultiRegions::PeriodicEntity::isLocal, m_edgeToTrace, m_maxQuad, m_nRanks, m_rankSharedEdges, Nektar::MultiRegions::AssemblyMapDG::RealignTraceElement(), and Nektar::LibUtilities::ReduceMax.

Referenced by AssemblyCommDG().

◆ PerformExchange()

void Nektar::MultiRegions::AssemblyCommDG::PerformExchange ( const Array< OneD, NekDouble > &  testFwd,
Array< OneD, NekDouble > &  testBwd 
)
inline

Perform the trace exchange between processors, given the forwards and backwards spaces.

Parameters
testFwdLocal forwards space of the trace (which will be sent)
testBwdLocal backwards space of the trace (which will receive contributions)

Definition at line 265 of file AssemblyCommDG.h.

267 {
268 m_exchange->PerformExchange(testFwd, testBwd);
269 }

References m_exchange.

◆ Timing()

std::tuple< NekDouble, NekDouble, NekDouble > Nektar::MultiRegions::AssemblyCommDG::Timing ( const LibUtilities::CommSharedPtr comm,
const int &  count,
const int &  num,
const ExchangeMethodSharedPtr f 
)
staticprivate

Timing of the MPI exchange method.

Timing of the exchange method f, performing the exchange count times for array of length num.

Parameters
commCommunicator
countNumber of timing iterations to run
numNumber of quadrature points to communicate
f#ExchangeMethod to time
Returns
tuple of loop times {avg, min, max}

Definition at line 706 of file AssemblyCommDG.cpp.

709{
710 Array<OneD, NekDouble> testFwd(num, 1);
711 Array<OneD, NekDouble> testBwd(num, -2);
712
713 LibUtilities::Timer t;
714 t.Start();
715 for (size_t i = 0; i < count; ++i)
716 {
717 f->PerformExchange(testFwd, testBwd);
718 }
719 t.Stop();
720
721 // These can just be 'reduce' but need to setup the wrapper in comm.h
722 Array<OneD, NekDouble> minTime(1, t.TimePerTest(count));
723 comm->AllReduce(minTime, LibUtilities::ReduceMin);
724
725 Array<OneD, NekDouble> maxTime(1, t.TimePerTest(count));
726 comm->AllReduce(maxTime, LibUtilities::ReduceMax);
727
728 Array<OneD, NekDouble> sumTime(1, t.TimePerTest(count));
729 comm->AllReduce(sumTime, LibUtilities::ReduceSum);
730
731 NekDouble avgTime = sumTime[0] / comm->GetSize();
732 return std::make_tuple(avgTime, minTime[0], maxTime[0]);
733}

References Nektar::LibUtilities::ReduceMax, Nektar::LibUtilities::ReduceMin, Nektar::LibUtilities::ReduceSum, Nektar::LibUtilities::Timer::Start(), Nektar::LibUtilities::Timer::Stop(), and Nektar::LibUtilities::Timer::TimePerTest().

Referenced by AssemblyCommDG().

Member Data Documentation

◆ m_edgeToTrace

std::map<int, std::vector<int> > Nektar::MultiRegions::AssemblyCommDG::m_edgeToTrace
private

Map of edge ID to quad point trace indices.

Definition at line 281 of file AssemblyCommDG.h.

Referenced by AssemblyCommDG(), and InitialiseStructure().

◆ m_exchange

ExchangeMethodSharedPtr Nektar::MultiRegions::AssemblyCommDG::m_exchange
private

Chosen exchange method (either fastest parallel or serial)

Definition at line 273 of file AssemblyCommDG.h.

Referenced by AssemblyCommDG(), and PerformExchange().

◆ m_maxQuad

int Nektar::MultiRegions::AssemblyCommDG::m_maxQuad = 0
private

Max number of quadrature points in an element.

Definition at line 275 of file AssemblyCommDG.h.

Referenced by AssemblyCommDG(), and InitialiseStructure().

◆ m_nRanks

int Nektar::MultiRegions::AssemblyCommDG::m_nRanks = 0
private

Number of ranks/processes/partitions.

Definition at line 277 of file AssemblyCommDG.h.

Referenced by AssemblyCommDG(), and InitialiseStructure().

◆ m_rankSharedEdges

std::map<int, std::vector<int> > Nektar::MultiRegions::AssemblyCommDG::m_rankSharedEdges
private

Map of process to shared edge IDs.

Definition at line 279 of file AssemblyCommDG.h.

Referenced by AssemblyCommDG(), and InitialiseStructure().