How to add a new scheme to Inciter
Contents
- Rationale and plan
- 1. Add a new keyword
- 2. Add new option switch
- 3. Add new Charm++ chare proxy in Scheme
- 4. Add new Charm++ chare array
-
5. New C++ class
- ALECG::ALECG – Constructor
- Transporter::comfinal() – Complete communication maps
- ALECG::setup() – Set initial conditions and compute the left hand side
- ALECG::lhs() – Compute own and send lhs on chare-boundary
- ALECG::comlhs() – Receive left hand side on chare boundary
- ALECG::lhsmerge() – Merge left hand side and continue
- ALECG::dt() – Start time step
- ALECG::rhs() & ALECG::comrhs() – Compute and communicate right hand side
- ALECG::solve() – Solve, diagnostics, refine
- ALECG::refine() – Optionally refine mesh
- ALECG::resize() – Resize data after mesh refinement
- 6. Making it all work
- 7. Augment unit tests for Scheme
- 8. Add new regression tests
Inciter supports multiple discretization schemes. This page describes how to add a scheme of your choice by walking through an example of adding a new one. We also discuss the main steps of the execution logic, which, at a high level, is the same for all discretization schemes.
Rationale and plan
Similar to the existing discretization schemes, DiagCG
, or DG
, the new scheme, ALECG
(short for Arbitrary Lagrangian-Eulerian Continuous Galerkin), will interact with Discretization
in a child-base fashion, e.g., will directly access (and reuse) its member data and functions. It will also intereact with Refiner
, for mesh refinement, and will also be migratable to enable dynamic load balancing. In essence, it will have everything an existing scheme has. However, we will not implement the low-level details of the actual numerical method, only the glue-code necessary to interact with the rest of the code and we make it ready to start implementing the low-level details of a particular discretization, done by a PDE
class, held behind a derived class of, e.g., CGPDE
or DGPDE
. For more details on how these classes interact, see also the Inciter software design page.
1. Add a new keyword
A specific discretization scheme is selected by the user in the control (input) file via the scheme
keyword, e.g., scheme diagcg
. We add the new keyword, alecg
, which then can be recognized by the control file parser, in src/Control/Keywords.h
by adding the following code block:
Control/Keywords.h
$ git diff src/Control/Keywords.h diff --git a/src/Control/Keywords.h b/src/Control/Keywords.h index 002869cb..c18f193a 100644 --- a/src/Control/Keywords.h +++ b/src/Control/Keywords.h @@ -4607,6 +4607,19 @@ struct diagcg_info { }; using diagcg = keyword< diagcg_info, TAOCPP_PEGTL_STRING("diagcg") >; +struct alecg_info { + static std::string name() { return "ALE-CG with RK"; } + static std::string shortDescription() { return "Select continuous Galerkin " + "with ALE + Runge-Kutta"; } + static std::string longDescription() { return + R"(This keyword is used to select the continuous Galerkin finite element + scheme in the arbitrary Lagrangian-Eulerian (ALE) reference frame combined + with Runge-Kutta (RK) time stepping. See Control/Inciter/Options/Scheme.h + for other valid options.)"; } +}; +using alecg = keyword< alecg_info, TAOCPP_PEGTL_STRING("alecg") >; + struct dg_info { static std::string name() { return "DG(P0) + RK"; } static std::string shortDescription() { return
We also add the new keyword to inciter's grammar's keywords pool:
Control/Inciter/InputDeck/InputDeck.h
$ git diff src/Control/Inciter/InputDeck/InputDeck.h diff --git a/src/Control/Inciter/InputDeck/InputDeck.h b/src/Control/Inciter/InputDeck/InputDeck.h index 83572480..20ce8975 100644 --- a/src/Control/Inciter/InputDeck/InputDeck.h +++ b/src/Control/Inciter/InputDeck/InputDeck.h @@ -144,6 +144,7 @@ class InputDeck : kw::scheme, kw::matcg, kw::diagcg, + kw::alecg, kw::dg, kw::dgp1, kw::flux,
This is required so that the compiler can generate a database containing the help for all the keywords in the grammar understood by inciter's control file parser. The above changes not only add the keyword but also some documentation that gets displayed when passing the -C
or -H
command line arguments to the inciter executable, so quick help is available at the user's fingertips:
$ inciter -C inciter Control File Keywords: advdiff string Specify the advection + diffusion physics configuration for a PDE advection string Specify the advection physics configuration for a PDE alecg Select continuous Galerkin with ALE + Runge-Kutta algorithm string Select mesh partitioning algorithm alpha real Set PDE parameter(s) alpha ... $ inciter -H alecg inciter control file keyword 'alecg' Select continuous Galerkin with ALE + Runge-Kutta (RK) This keyword is used to select the continuous Galerkin finite element scheme in the arbitrary Lagrangian-Eulerian (ALE) reference frame combined with Runge-Kutta (RK) time stepping. See Control/Inciter/Options/Scheme.h for other valid options.
2. Add new option switch
Next is to add a new state to the existing Scheme option switch. This "option
switch" is really only a fancy enum, used to store the user's choice of the discretization scheme after parsing the control file in a type-safe manner. This fancy enum is an option switch because it inherits from tk::src/Control/Inciter/Options/Scheme.h
:
Control/Inciter/Options/Scheme.h
$ git diff src/Control/Inciter/Options/Scheme.h diff --git a/src/Inciter/SchemeBase.h b/src/Inciter/SchemeBase.h index 61510d01..0cb3e9e8 100644 --- a/src/Inciter/SchemeBase.h +++ b/src/Inciter/SchemeBase.h @@ -22,6 +22,7 @@ #include "NoWarning/matcg.decl.h" #include "NoWarning/diagcg.decl.h" +#include "NoWarning/alecg.decl.h" #include "NoWarning/distfct.decl.h" #include "NoWarning/dg.decl.h" #include "NoWarning/discretization.decl.h" @@ -52,8 +53,11 @@ class SchemeBase { proxy = static_cast< CProxy_DiagCG >( CProxy_DiagCG::ckNew(m_bound) ); fctproxy= CProxy_DistFCT::ckNew(m_bound); } else if (scheme == ctr::SchemeType::DG || - scheme == ctr::SchemeType::DGP1) { + scheme == ctr::SchemeType::DGP1) + { proxy = static_cast< CProxy_DG >( CProxy_DG::ckNew(m_bound) ); + } else if (scheme == ctr::SchemeType::ALECG) { + proxy = static_cast< CProxy_ALECG >( CProxy_ALECG::ckNew(m_bound) ); } else Throw( "Unknown discretization scheme" ); } @@ -75,11 +79,12 @@ class SchemeBase { const CkArrayOptions& arrayoptions() { return m_bound; } //! Variant type listing all chare proxy types modeling the same concept - using Proxy = boost::variant< CProxy_DiagCG, CProxy_DG >; + using Proxy = + boost::variant< CProxy_DiagCG, CProxy_DG, CProxy_ALECG >; //! Variant type listing all chare element proxy types (behind operator[]) using ProxyElem = boost::variant< CProxy_DiagCG::element_t, - CProxy_DG::element_t >; + CProxy_DG::element_t, CProxy_ALECG::element_t >; protected: //! Variant storing one proxy to which this class is configured for
3. Add new Charm++ chare proxy in Scheme
Scheme
is a class that, together with its base, SchemeBase
, implements concept-based runtime polymorphism for migratable Charm++ chare arrays using value semantics. Client code, e.g., Transporter
, interacts with Discretization
and its children via a uniform interface provided by Scheme
, which dispatches entry method calls to the correct class instance, the base or the child, and is capable of performing broadcasts as well as addressing a particular chare array element. Read more details at src/Inciter/Scheme.h. To teach it to dispatch to our new ALECG
scheme, besides the existing ones, we make the following changes:
Inciter/SchemeBase.h
$ git diff src/Inciter/SchemeBase.h diff --git a/src/Inciter/SchemeBase.h b/src/Inciter/SchemeBase.h index 61510d01..dea3d78a 100644 --- a/src/Inciter/SchemeBase.h +++ b/src/Inciter/SchemeBase.h @@ -22,6 +22,7 @@ #include "NoWarning/matcg.decl.h" #include "NoWarning/diagcg.decl.h" +#include "NoWarning/alecg.decl.h" #include "NoWarning/distfct.decl.h" #include "NoWarning/dg.decl.h" #include "NoWarning/discretization.decl.h" @@ -51,6 +52,8 @@ class SchemeBase { } else if (scheme == ctr::SchemeType::DiagCG) { proxy = static_cast< CProxy_DiagCG >( CProxy_DiagCG::ckNew(m_bound) ); fctproxy= CProxy_DistFCT::ckNew(m_bound); + } else if (scheme == ctr::SchemeType::ALECG) { + proxy = static_cast< CProxy_ALECG >( CProxy_ALECG::ckNew(m_bound) ); } else if (scheme == ctr::SchemeType::DG || scheme == ctr::SchemeType::DGP1) { proxy = static_cast< CProxy_DG >( CProxy_DG::ckNew(m_bound) ); @@ -75,11 +78,12 @@ class SchemeBase { const CkArrayOptions& arrayoptions() { return m_bound; } //! Variant type listing all chare proxy types modeling the same concept - using Proxy = boost::variant< CProxy_DiagCG, CProxy_DG >; + using Proxy = + boost::variant< CProxy_DiagCG, CProxy_ALECG, CProxy_DG >; //! Variant type listing all chare element proxy types (behind operator[]) using ProxyElem = boost::variant< CProxy_DiagCG::element_t, - CProxy_DG::element_t >; + CProxy_ALECG::element_t, CProxy_DG::element_t >; protected: //! Variant storing one proxy to which this class is configured for
4. Add new Charm++ chare array
Next is to add a new class, ALECG
, which will serve as the glue between Transporter
, Refiner
, and CGPDE
. These classes, respectively, are the driver, the mesh refiner, and the polymorphic vector of PDE discretization class objects that hold the low-level details of the numerical implementation of spatial discretizations, dispatching to multiple specific systems of equations, e.g., cg::
or cg::
.
We create the following new files:
- Inciter/
alecg.ci, Charm++ interface file for ALECG, - NoWarning/
alecg.decl.h and NoWarning/ alecg.def.h, which help ignore compiler warnings in Charm++-generated code, and - Inciter/ALECG.h and Inciter/
ALECG.cpp, header and implementation of ALECG.
Before we discuss the details of the above new files, let's get a couple of simple things out of the way. We also need to add the new include to Refiner.h
so, e.g., it can call back to ALECG::resize() after a mesh refinement step:
Inciter/Refiner.h
$ git diff src/Inciter/Refiner.h diff --git a/src/Inciter/Refiner.h b/src/Inciter/Refiner.h index dfcb1ffd..4fe743a4 100644 --- a/src/Inciter/Refiner.h +++ b/src/Inciter/Refiner.h @@ -29,6 +29,7 @@ #include "SchemeBase.h" #include "DiagCG.h" +#include "ALECG.h" #include "DG.h" #include "NoWarning/transporter.decl.h"
We also tell the build system about our new ALECG
class and its Charm++ module:
Inciter/CMakeLists.txt
$ gd src/Inciter/CMakeLists.txt diff --git a/src/Inciter/CMakeLists.txt b/src/Inciter/CMakeLists.txt index 141055ec..e339b65b 100644 --- a/src/Inciter/CMakeLists.txt +++ b/src/Inciter/CMakeLists.txt @@ -14,6 +14,7 @@ add_library(Inciter Sorter.cpp DiagCG.cpp + ALECG.cpp DG.cpp FluxCorrector.cpp DistFCT.cpp @@ -74,6 +75,7 @@ addCharmModule( "refiner" "Inciter" ) addCharmModule( "sorter" "Inciter" ) addCharmModule( "matcg" "Inciter" ) addCharmModule( "diagcg" "Inciter" ) +addCharmModule( "alecg" "Inciter" ) addCharmModule( "distfct" "Inciter" ) addCharmModule( "dg" "Inciter" )
The addCharmModule
cmake macro above, defined in cmake/charm.cmake
, ensures that build target Inciter
will properly depend on our new alecg
Charm++ module, defined in Inciter/
. The macro also tells cmake how the two files, alecg.decl.h
and alecg.def.h
, are generated from alecg.ci
: using charmc
, a compiler wrapper that generates Charm++-code to make the ALECG
from an ordinary C++ class into a Charm++ chare array, with entry methods callable across the network, make it migratable, enable its structured DAGger, etc. See also the Charm++ manual.
Now to the new files. First is the new Charm++ interface file, Inciter/
Inciter/alecg.ci
This is the file that is parsed by Charm++'s compiler which then generates additional code that makes ALECG a Charm++ chare array, makes it migratable, etc. The full listing is at Inciter/
Inciter/alecg.ci – External modules and header includes
extern module transporter; extern module discretization; include "UnsMesh.hpp"; include "PUPUtil.hpp";
First we declare some external Charm++ modules that ALECG needs to interact with and thus from where we need type information. The extern module
statements are followed by some usual C++ include
s (without the #
): these are in the Charm++ interface file because the Charm++ code below requires type information from them.
Inciter/alecg.ci – 1D Charm++ chare array
array [1D] ALECG {
Next comes the specification of the ALECG Charm++ chare array. This is a 1D array whose elements at runtime will be distributed across the available processing elements and compute nodes. If load balancing is enabled, the array elements (C++ objects) are migrated to homogenize load across a simulation. Because the array is 1D, we use a single integer index to address a particular array element. Charm++ also allows multi-dimensional arrays which can be useful if the problem naturally maps to a multi-dimensional notion, e.g., partitioning a 3D Cartesian mesh, so index calculations to address array elements (and thus work-units) become cleaner.
Inciter/alecg.ci – Entry methods
entry ALECG( const CProxy_Discretization& disc, const std::map< int, std::vector< std::size_t > >& bface, const std::map< int, std::vector< std::size_t > >& bnode, const std::vector< std::size_t >& triinpoel ); initnode void registerReducers(); entry void setup(); entry void box( tk::real v ); entry void resizeComm(); entry void nodeNeighSetup(); entry void start(); entry void refine( const std::vector< tk::real >& l2ref ); entry [reductiontarget] void advance( tk::real newdt ); entry void comdfnorm( const std::unordered_map< tk::UnsMesh::Edge, std::array< tk::real, 3 >, tk::UnsMesh::Hash<2>, tk::UnsMesh::Eq<2> >& dfnorm ); entry void comnorm( const std::unordered_map< int, std::unordered_map< std::size_t, std::array< tk::real, 4 > > >& innorm ); entry void comlhs( const std::vector< std::size_t >& gid, const std::vector< std::vector< tk::real > >& L ); entry void comChBndGrad( const std::vector< std::size_t >& gid, const std::vector< std::vector< tk::real > >& G ); entry void comrhs( const std::vector< std::size_t >& gid, const std::vector< std::vector< tk::real > >& R ); entry void resized(); entry void lhs(); entry void step(); entry void next(); entry void stage(); entry void evalLB( int nrestart );
We simply list those member functions of ALECG as entry methods, e.g., ALECG::ALECG::
. Note also that there is an initnode
entry method, ALECG::
which is a special member function that is also declared as static in the C++ sense (see ALECG.h). This is static because the runtime system must be able to call this function without creating an object and a lot earlier than the actual ALECG chare array elements are created. This is how custom reducers can be associated in Charm++ to a chare array. Such custom reducers are an excellent way to rely on the asynchronous, tree-based implementation of parallel reductions in Charm++ yet still do it on custom, arbitrarily complex data types, e.g., a hash-map that holds vectors, as long as one defines how aggregation is to be performed when merging such data. Such an example is given in Inciter/
Inciter/alecg.ci – Structured DAG
entry void wait4lhs() { when ownlhs_complete(), comlhs_complete(), ownnorm_complete(), comnorm_complete(), owndfnorm_complete(), comdfnorm_complete(), transfer_complete() serial "lhs" { lhsmerge(); } } entry void wait4grad() { when owngrad_complete(), comgrad_complete() serial "grad" { rhs(); } } entry void wait4rhs() { when ownrhs_complete(), comrhs_complete() serial "rhs" { solve(); } } entry void wait4trans() { when lhs_complete(), resize_complete() serial "trans" { transfer(); } } entry void ownnorm_complete(); entry void comnorm_complete(); entry void owndfnorm_complete(); entry void comdfnorm_complete(); entry void ownlhs_complete(); entry void comlhs_complete(); entry void transfer_complete(); entry void ownrhs_complete(); entry void comrhs_complete(); entry void owngrad_complete(); entry void comgrad_complete(); entry void lhs_complete(); entry void resize_complete();
The entry methods, defined in the .ci
file and with when
keywords, form a structured directed acyclic graph (DAG). These specify logical relations among tasks and execution logic within the class. For example, wait4lhs
tells the runtime system that only when ownlhs_complete()
and comlhs_complete()
are both done will lhsmerge()
be called. In this case, this construct ensures that the runtime system will call a member function that operates on the left-hand side matrix, when both the local and external contributions are complete. Note that this logic only relates to a given array element, say with index 2. Another one, say index 3, may perform this operation at a very different time and independently, thus computation and communication can overlap. The entry methods listed at the bottom, e.g., ownlhs_complete()
can be thought of as "labels" to the runtime systems that help define the task logic. These labels are functions that the runtime system defines and we call them when the given task is complete. Note that the construct we used here, when A and B are both complete then do C, is probably the simplest task-logic Charm++ allows prescribing. There are many more advanced ways of expressing such logic, e.g., using loops. For more details, see Section Structured Control Flow: Structured Dagger in the Charm++ manual.
NoWarning/alecg.decl.h and NoWarning/alecg.def.h
The newly added files to the NoWarning/
directory simply include the Charm++-generated alecg.decl.h
and alecg.def.h
files and locally, around the include, turn off specific compiler warnings for various compilers – we will not discuss them here further. Full listings are at NoWarning/alecg.decl.h and NoWarning/alecg.def.h.
5. New C++ class
Next are the newly added Inciter/ALECG.h and Inciter/
ALECG::ALECG – Constructor
{ usesAtSync = true; // enable migration at AtSync auto d = Disc(); // Perform optional operator-access-pattern mesh node reordering if (g_inputdeck.get< tag::discr, tag::operator_reorder >()) { // Create new local ids based on access pattern of PDE operators std::unordered_map< std::size_t, std::size_t > map; std::size_t n = 0; for (std::size_t p=0; p<m_u.nunk(); ++p) { // for each point p if (map.find(p) == end(map)) map[p] = n++; for (auto q : tk::Around(m_psup,p)) { // for each edge p-q if (map.find(q) == end(map)) map[q] = n++; } } Assert( map.size() == d->Gid().size(), "Map size mismatch" ); // Remap data in bound Discretization object d->remap( map ); // Recompute elements surrounding points m_esup = tk::genEsup( d->Inpoel(), 4 ); // Recompute points surrounding points m_psup = tk::genPsup( d->Inpoel(), 4, m_esup ); // Remap boundary triangle face connectivity tk::remap( m_triinpoel, map ); } // Activate SDAG wait for initially computing the left-hand side and normals thisProxy[ thisIndex ].wait4lhs(); // Generate callbacks for solution transfers we are involved in // Always add a callback to be used when we are not involved in any transfers std::vector< CkCallback > cb; auto c = CkCallback(CkIndex_ALECG::transfer_complete(), thisProxy[thisIndex]); cb.push_back( c ); // Generate a callback for each transfer we are involved in (either as a // source or a destination) auto meshid = d->MeshId(); for (const auto& t : d->Transfers()) if (meshid == t.src || meshid == t.dst) cb.push_back( c ); // Send callbacks to base d->transferCallback( cb ); }
As discussed in Section Creating workers on the Inciter software design page, the worker chare array elements, such as ALECG, are created using Charm++'s dynamic array insertion feature. This is an asynchronous call, issued from Sorter::
In the constructor's body, listed above, we first enable migration for the class, then the local communication buffers are initialized by sizing them for the first time. Finally, we signal the runtime system that extra communication buffers, specific to this particular discretization scheme, have been created. This is a reduction call, issued by all array elements, eventually calling the reduction target Transporter::
a single time.
Transporter::comfinal() – Complete communication maps
{ auto meshid = tk::cref_find( m_meshid, static_cast<std::size_t>(summeshid) ); if (initial > 0) { m_scheme[meshid].bcast< Scheme::setup >(); // Turn on automatic load balancing if (++m_ncom == m_nelem.size()) { // all worker arrays have finished m_ncom = 0; auto print = printer(); m_progWork.end( print ); tk::CProxy_LBSwitch::ckNew(); print.diag( "Load balancing on (if enabled in Charm++)" ); } } else { m_scheme[meshid].bcast< Scheme::lhs >(); } }
Though asynchronously executed, the reduction operation targeting Transporter::
is a global synchronization point: all chares arrive in that function body, synchronized, and all continue from there again by calling ALECG::
Transporter::setup()
member function can be invoked. This is because setup()
starts using those communication maps, e.g., when it starts computing and assembling the left hand side matrix or vector. When a chare receives data from others this data must be correctly sized and ready on all chares before these data containers can be used. If a global synchronization point did not precede setup()
, chares that finish their constructor early might go ahead all the way to calling ALECG::
ALECG::setup() – Set initial conditions and compute the left hand side
void ALECG::start() // ***************************************************************************** // Start time stepping // ***************************************************************************** { // Start timer measuring time stepping wall clock time Disc()->Timer().zero(); // Zero grind-timer Disc()->grindZero(); // Continue to next time step next(); }
In the ALECG::wait4lhs()
, which activates the relevant part of the DAG, discussed above in Section Inciter/alecg.ci – Structured DAG. Then
ALECG::lhs() – Compute own and send lhs on chare-boundary
void ALECG::lhs() // ***************************************************************************** // Compute the left-hand side of transport equations //! \details Also (re-)compute all data structures if the mesh changed. // ***************************************************************************** { auto d = Disc(); // Compute lumped mass lhs m_lhs = tk::lump( m_u.nprop(), d->Coord(), d->Inpoel() ); if (d->NodeCommMap().empty()) // in serial we are done comlhs_complete(); else // send contributions of lhs to chare-boundary nodes to fellow chares for (const auto& [c,n] : d->NodeCommMap()) { std::vector< std::vector< tk::real > > l( n.size() ); std::size_t j = 0; for (auto i : n) l[ j++ ] = m_lhs[ tk::cref_find(d->Lid(),i) ]; thisProxy[c].comlhs( std::vector<std::size_t>(begin(n),end(n)), l ); } ownlhs_complete(); // (Re-)compute boundary point-, and dual-face normals norm(); }
As the above ALECG::comlhs_complete()
right away. If the map is not empty, we loop through the map and for each chare the given chare shares a node (or multiple nodes) with we collect the values of the lhs in those nodes into a vector and send them to the given destination chare. The send is done via the entry method function call thisProxy[ targetchare ].comlhs()
, which sends its arguments to chare id 'target'chare' in a point-point fashion.
ALECG::comlhs() – Receive left hand side on chare boundary
void ALECG::comlhs( const std::vector< std::size_t >& gid, const std::vector< std::vector< tk::real > >& L ) // ***************************************************************************** // Receive contributions to left-hand side diagonal matrix on chare-boundaries //! \param[in] gid Global mesh node IDs at which we receive LHS contributions //! \param[in] L Partial contributions of LHS to chare-boundary nodes //! \details This function receives contributions to m_lhs, which stores the //! diagonal (lumped) mass matrix at mesh nodes. While m_lhs stores //! own contributions, m_lhsc collects the neighbor chare contributions during //! communication. This way work on m_lhs and m_lhsc is overlapped. The two //! are combined in lhsmerge(). // ***************************************************************************** { Assert( L.size() == gid.size(), "Size mismatch" ); using tk::operator+=; auto d = Disc(); for (std::size_t i=0; i<gid.size(); ++i) { m_lhsc[ gid[i] ] += L[i]; } // When we have heard from all chares we communicate with, this chare is done if (++m_nlhs == d->NodeCommMap().size()) { m_nlhs = 0; comlhs_complete(); } }
The above code snippet from ALECG::gid
the list of global node IDs and in L
the list of lhs values, one for each scalar component of the number of equations solved (in a system of systems). The sizes of the two vectors must equal – this is the number of nodes we receive data for from one other chare. Then we loop over all incoming data, find the local IDs for the global IDs and store them by adding their contributions at each node received. As the comment says, when this chare has received all contributions it supposed to receive, we tell the runtime system that on this chare communication of the lhs is finished by calling comlhs_cmplete()
. The completion condition is implemented via a counter, whose value is incremented upon each call to this function and testing its equality with the size of the symmetric node communication map (Discretization::m_msum), which has data for as many chares as many other chare a given chare must communicate with. As discussed in Inciter/alecg.ci – Structured DAG, when both own and communicated parts of the lhs are complete, the runtime system calls lhsmerge
().
ALECG::lhsmerge() – Merge left hand side and continue
void ALECG::lhsmerge() // ***************************************************************************** // The own and communication portion of the left-hand side is complete // ***************************************************************************** { // Combine own and communicated contributions to left hand side auto d = Disc(); // Combine own and communicated contributions to LHS and ICs for (const auto& b : m_lhsc) { auto lid = tk::cref_find( d->Lid(), b.first ); for (ncomp_t c=0; c<m_lhs.nprop(); ++c) m_lhs(lid,c,0) += b.second[c]; } // Clear receive buffer tk::destroy(m_lhsc); // Combine own and communicated contributions of normals and apply boundary // conditions on the initial conditions normfinal(); // Continue after lhs is complete if (m_initial) { // Output initial conditions to file writeFields( CkCallback(CkIndex_ALECG::start(), thisProxy[thisIndex]) ); } else { lhs_complete(); } }
When the own and communicated contributions to the lhs are in place, the communication buffer for the lhs, ALECG::
, is merged into ALECG::
. As the above code snippet of ALECG::lhsmerge()
was called during the first step, we call ALECG::wait4out
, waiting for multiple overlapping tasks, required for continuing.) ALECG::
ALECG::dt() – Start time step
if (g_inputdeck.get< tag::discr, tag::steady_state >()) { // compute new dt for each mesh point for (const auto& eq : g_cgpde) eq.dt( d->It(), d->Vol(), m_u, m_dtp ); // find the smallest dt of all nodes on this chare mindt = *std::min_element( begin(m_dtp), end(m_dtp) ); } else { // compute new dt for this chare // find the smallest dt of all equations on this chare for (const auto& eq : g_cgpde) { auto eqdt = eq.dt( d->Coord(), d->Inpoel(), d->T(), m_u ); if (eqdt < mindt) mindt = eqdt; } }
The above code snippet from ALECG::for
loop that calls the the dt()
member function of all types of PDEs configured by the user and finds the minimum size of the time step.
// Actiavate SDAG waits for next time step stage thisProxy[ thisIndex ].wait4grad(); thisProxy[ thisIndex ].wait4rhs(); thisProxy[ thisIndex ].wait4trans(); // Contribute to minimum dt across all chares the advance to next step contribute( sizeof(tk::real), &mindt, CkReduction::min_double, CkCallback(CkReductionTarget(ALECG,advance), thisProxy) );
Once we have the time step size, we enable a couple of SDAG waits and issue a reduction to Transporter::advance() which yields the global minimum across all chares then issues a broadcast to ALECG::advance()
saves the new time step in Discretization::
, which is the master copy, then calls ALECG::
, which starts computing the right hand sides of all PDEs integrated.
ALECG::rhs() & ALECG::comrhs() – Compute and communicate right hand side
Computing the right hand sides (rhs) of all PDE operators and communicating the rhs values in chare boundary nodes look exactly the same as the analogous functions for the lhs. When both the own and communicated contributions are complete on a chare, the runtime system calls ALECG::
ALECG::solve() – Solve, diagnostics, refine
if (m_stage < 2) { // Activate SDAG wait for next time step stage thisProxy[ thisIndex ].wait4grad(); thisProxy[ thisIndex ].wait4rhs(); // continue to mesh-to-mesh transfer (if coupled) transfer(); } else { // Compute diagnostics, e.g., residuals auto diag_computed = m_diag.compute( *d, m_u, m_un, m_bnorm, m_symbcnodes, m_farfieldbcnodes ); // Increase number of iterations and physical time d->next(); // Advance physical time for local time stepping if (steady) for (std::size_t i=0; i<m_u.nunk(); ++i) m_tp[i] += m_dtp[i]; // Continue to mesh refinement (if configured) if (!diag_computed) refine( std::vector< tk::real >( m_u.nprop(), 1.0 ) ); }
The above code snippet shows what happens immediately after solving the linear system on a chare. First we compute diagnostics, which is a catch-all phrase for various norms and integral quantities, see Inciter/m_diag.compute()
returns true, diagnostics have been computed in this time step. If diagnostics have been computed, their correct values require global reduction operations, performing different aggregation operations depending on the value. As all reductions, diagnostics are also collected by Transporter, this time in target Transporter::
, which calls back, via a broadcast, to ALECG::diag(), which signals, on the ALECG chare, that diagnostics have been computed. If diagnostics have not been computed in this time step, we call ALECG::diag() right away. Next we increase the number of iterations taken and update physical time on the master copies, Discretization::
ALECG::refine() – Optionally refine mesh
auto d = Disc(); const auto nstep = g_inputdeck.get< tag::discr, tag::nstep >(); const auto term = g_inputdeck.get< tag::discr, tag::term >(); const auto steady = g_inputdeck.get< tag::discr, tag::steady_state >(); const auto residual = g_inputdeck.get< tag::discr, tag::residual >(); const auto rc = g_inputdeck.get< tag::discr, tag::rescomp >() - 1; const auto eps = std::numeric_limits< tk::real >::epsilon(); if (steady) { // this is the last time step if max time of max number of time steps // reached or the residual has reached its convergence criterion if (std::abs(d->T()-term) < eps || d->It() >= nstep || l2res[rc] < residual) m_finished = 1; } else { // this is the last time step if max time or max iterations reached if (std::abs(d->T()-term) < eps || d->It() >= nstep) m_finished = 1; } auto dtref = g_inputdeck.get< tag::amr, tag::dtref >(); auto dtfreq = g_inputdeck.get< tag::amr, tag::dtfreq >(); // if t>0 refinement enabled and we hit the frequency if (dtref && !(d->It() % dtfreq)) { // refine // Activate SDAG waits for re-computing the left-hand side thisProxy[ thisIndex ].wait4lhs(); d->startvol(); d->Ref()->dtref( {}, m_bnode, {} ); d->refined() = 1; } else { // do not refine d->refined() = 0; lhs_complete(); resized(); }
The above snippet shows that mesh refinement happens only at every few time step with its frequency configured by the user. If the mesh is not refined, we simply enable the SDAG waits associated to the tasks of the mesh refinement step. If the mesh is refined, we call a member function of the mesh refiner object held by Discretization, Refiner::
ALECG::resize() – Resize data after mesh refinement
void ALECG::resizePostAMR( const std::vector< std::size_t >& /*ginpoel*/, const tk::UnsMesh::Chunk& chunk, const tk::UnsMesh::Coords& coord, const std::unordered_map< std::size_t, tk::UnsMesh::Edge >& addedNodes, const std::unordered_map< std::size_t, std::size_t >& /*addedTets*/, const tk::NodeCommMap& nodeCommMap, const std::map< int, std::vector< std::size_t > >& bface, const std::map< int, std::vector< std::size_t > >& bnode, const std::vector< std::size_t >& triinpoel ) // ***************************************************************************** // Receive new mesh from Refiner //! \param[in] ginpoel Mesh connectivity with global node ids //! \param[in] chunk New mesh chunk (connectivity and global<->local id maps) //! \param[in] coord New mesh node coordinates //! \param[in] addedNodes Newly added mesh nodes and their parents (local ids) //! \param[in] addedTets Newly added mesh cells and their parents (local ids) //! \param[in] nodeCommMap New node communication map //! \param[in] bface Boundary-faces mapped to side set ids //! \param[in] bnode Boundary-node lists mapped to side set ids //! \param[in] triinpoel Boundary-face connectivity // ***************************************************************************** { auto d = Disc(); // Set flag that indicates that we are during time stepping m_initial = 0; // Zero field output iteration count between two mesh refinement steps d->Itf() = 0; // Increase number of iterations with mesh refinement ++d->Itr(); // Resize mesh data structures d->resizePostAMR( chunk, coord, nodeCommMap ); // Resize auxiliary solution vectors auto npoin = coord[0].size(); auto nprop = m_u.nprop(); m_u.resize( npoin, nprop ); m_un.resize( npoin, nprop ); m_lhs.resize( npoin, nprop ); m_rhs.resize( npoin, nprop ); m_chBndGrad.resize( d->Bid().size(), nprop*3 ); // Update solution on new mesh for (const auto& n : addedNodes) for (std::size_t c=0; c<nprop; ++c) m_u(n.first,c,0) = (m_u(n.second[0],c,0) + m_u(n.second[1],c,0))/2.0; // Update physical-boundary node-, face-, and element lists m_bnode = bnode; m_bface = bface; m_triinpoel = tk::remap( triinpoel, d->Lid() ); auto meshid = d->MeshId(); contribute( sizeof(std::size_t), &meshid, CkReduction::nop, CkCallback(CkReductionTarget(Transporter,resized), d->Tr()) ); }
The above snippet shows ALECG::resize() called by Refiner when it finished mesh refinement. Besides resizing the mesh-related data held locally by ALECG, e.g., ALECG::wait4lhs()
. When all of this is done, we issue a reduction to Transporter::workresized(), which when called will mean that all workers have resized their data after mesh refinement. However, this is only one concurrent (asynchronous) thread of execution. Another one is started within d->resize()
, which calls Discretization::resize(), which eventually reduces to Transporter::discresized(). When both of these independent threads finished, Transporter::Transporter::wait4resize()
. Transporter::lhs_complete()
, while Transporter calls ALECG::resize_complete()
.
The end-of-time-step threads are (1) computing diagnostics, discussed above, (2) mesh refinement (and resizing of some of the mesh data structures), (3) resized (complete resizeing of mesh data structures, i.e., also those that require communication, e.g., nodal volumes), and (4) recomputing the lhs. The code-snippet on Inciter/
When all the end-of-time-step, independent threads have finished, we call ALECG::
6. Making it all work
Only a couple of minor, but important, steps remain. First we add the new Charm++ module as an external module in inciter's Charm++ module. This is required so that all Charm++ code that references the new ALECG Charm++ chare array is visible and can correctly interact with Inciter's main charm chare.
Main/inciter.ci
$ git diff src/Main/inciter.ci diff --git a/src/Main/inciter.ci b/src/Main/inciter.ci index bf7eac98..e9b114b6 100644 --- a/src/Main/inciter.ci +++ b/src/Main/inciter.ci @@ -14,6 +14,7 @@ mainmodule inciter { extern module partitioner; extern module matcg; extern module diagcg; + extern module alecg; extern module dg; extern module charestatecollector;
The second, and final, step is to enable triggering the instantiation of specialized CGPDE class objects for our new ALECG scheme when the system of systems is instantiated. This associates the type of generic PDE systems that is used to instantiate the PDE classes, selected by user configuration. Since ALECG will be a node-centered scheme, we assign it to use the CGPDE polymorphic interface (instead of DGPDE, which is tailored for cell-centered discretizations).
PDE/PDEStack.cpp
$ git diff src/PDE/PDEStack.cpp diff --git a/src/PDE/PDEStack.cpp b/src/PDE/PDEStack.cpp index 438cb5e3..9b2e14e7 100644 --- a/src/PDE/PDEStack.cpp +++ b/src/PDE/PDEStack.cpp @@ -108,7 +108,9 @@ PDEStack::selectedCG() const std::vector< CGPDE > pdes; // will store instantiated PDEs const auto sch = g_inputdeck.get< tag::discr, tag::scheme >(); - if (sch == ctr::SchemeType::DiagCG) { + if (sch == ctr::SchemeType::DiagCG || sch == ctr::SchemeType::ALECG) { for (const auto& d : g_inputdeck.get< tag::selected, tag::pde >()) { if (d == ctr::PDEType::TRANSPORT) pdes.push_back( createCG< tag::transport >( d, cnt ) ); else if (d == ctr::PDEType::COMPFLOW) pdes.push_back( createCG< tag::compflow >( d, cnt ) ); else Throw( "Can't find selected CGPDE" ); } }
7. Augment unit tests for Scheme
Though this is not strictly necessary, we also augment the unit tests of Scheme exercising our new discretization scheme:
$ git diff develop src/UnitTest/TUTSuite.h src/UnitTest/tests/Inciter/TestScheme.cpp diff --git a/src/UnitTest/TUTSuite.h b/src/UnitTest/TUTSuite.h index 191b3972..dd904b02 100644 --- a/src/UnitTest/TUTSuite.h +++ b/src/UnitTest/TUTSuite.h @@ -61,7 +61,7 @@ class TUTSuite : public CBase_TUTSuite { { "Base/Factory", 2 } , { "Base/PUPUtil", 14 } , { "Base/Timer", 1 } - , { "Inciter/Scheme", 3 } + , { "Inciter/Scheme", 4 } }; // Tests that must be run on PE 0 diff --git a/src/UnitTest/tests/Inciter/TestScheme.cpp b/src/UnitTest/tests/Inciter/TestScheme.cpp index 6dc48c75..e4acfce4 100644 --- a/src/UnitTest/tests/Inciter/TestScheme.cpp +++ b/src/UnitTest/tests/Inciter/TestScheme.cpp @@ -84,6 +84,8 @@ void Scheme_object::test< 1 >() { ensure_equals( "Underlying type", c.which(), 1 ); inciter::Scheme d( inciter::ctr::SchemeType::DG ); ensure_equals( "Underlying type", d.which(), 2 ); + inciter::Scheme a( inciter::ctr::SchemeType::ALECG ); + ensure_equals( "Underlying type", a.which(), 3 ); } //! Test if operator[] returns the correct underlying type @@ -97,6 +99,8 @@ void Scheme_object::test< 2 >() { ensure_equals( "Underlying element type", c.which_element(), 1 ); inciter::Scheme d( inciter::ctr::SchemeType::DG ); ensure_equals( "Underlying element type", d.which_element(), 2 ); + inciter::Scheme a( inciter::ctr::SchemeType::ALECG ); + ensure_equals( "Underlying element type", a.which_element(), 3 ); } @@ -162,6 +166,27 @@ void Scheme_object::test< 5 >() { inciter::Scheme( inciter::ctr::SchemeType::DG ), 2, "DG" ); } +//! Test Pack/Unpack of Scheme holding CProxy_AELCG +//! \details Every Charm++ migration test, such as this one, consists of two +//! unit tests: one for send and one for receive. Both trigger a TUT test, +//! but the receive side is created manually, i.e., without the awareness of +//! the TUT library. Unfortunately thus, there is no good way to count up +//! these additional tests, and thus if a test such as this is added to the +//! suite this number must be updated in UnitTest/TUTSuite.h in +//! unittest::TUTSuite::m_migrations. +template<> template<> +void Scheme_object::test< 6 >() { + // This test spawns a new Charm++ chare. The "1" at the end of the test name + // signals that this is only the first part of this test: the part up to + // firing up an asynchronous Charm++ chare. The second part creates a new test + // result, sending it back to the suite if successful. If that chare never + // executes, the suite will hang waiting for that chare to call back. + set_test_name( "Charm:migrate Scheme(ALECG) 1" ); + + CProxy_Receiver::ckNew( + inciter::Scheme( inciter::ctr::SchemeType::ALECG ), 3, "ALECG" ); +}
Now that we will test ALECG using the unit test harness, UnitTest, we also have to make the UnitTest build target depend on the new ALECG Charm++ module:
$ git diff src/UnitTest/CMakeLists.txt diff --git a/src/UnitTest/CMakeLists.txt b/src/UnitTest/CMakeLists.txt index bb740285..e0ea47fe 100644 --- a/src/UnitTest/CMakeLists.txt +++ b/src/UnitTest/CMakeLists.txt @@ -48,6 +48,7 @@ add_dependencies("UnitTest" "unittestCharmModule") if (ENABLE_INCITER) add_dependencies("UnitTest" "matcgCharmModule") add_dependencies("UnitTest" "diagcgCharmModule") + add_dependencies("UnitTest" "alecgCharmModule") add_dependencies("UnitTest" "distfctCharmModule") add_dependencies("UnitTest" "dgCharmModule") add_dependencies("UnitTest" "discretizationCharmModule")
8. Add new regression tests
Finally, we also add a bunch of new regression tests that stress-test the asynchronous logic in the discretization scheme classes:
$ git diff tests/regression/inciter/transport/SlotCyl/asynclogic/CMakeLists.txt index b54a207d..62732129 100644 --- a/tests/regression/inciter/transport/SlotCyl/asynclogic/CMakeLists.txt +++ b/tests/regression/inciter/transport/SlotCyl/asynclogic/CMakeLists.txt @@ -1,7 +1,7 @@ # See cmake/add_regression_test.cmake for documentation on the arguments to # add_regression_test(). -foreach(scheme matcg diagcg dg) +foreach(scheme matcg diagcg dg alecg) foreach(virt 0.0 0.5 0.9) foreach(npes RANGE 1 8) add_regression_test(asynclogic_${scheme}_${virt} ${INCITER_EXECUTABLE}