TN-12-2_EllipticSolversWithinNektar
===================================

.. meta::
   :description: technical note

   :keywords: Report,2047356-TN-12,(D1.3):,Elliptic,solvers,within,Nektar++,David,Moxey,,King’s,College,London,Ben,Dudson,,Peter,Hill,,Ed,Higgins,,David,Dickinson,,&,Steven,Wright,,University,of,York,March,30,,2022,Contents,1,Executive,summary,2,Introduction,3,Formulation,4,Implementation,strategies,&,operator,evaluation,4.1,Global,matrix,approach,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,4.2,Local,matrices,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,4.3,Static,condensation,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,4.4,Matrix-free,operations,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,4.5,Solvers,in,Nektar++,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,5,Preconditioning,5.1,Diagonal,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,5.2,Linear,space,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,5.3,Block,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,5.4,Low,energy,preconditioner,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,1,2,3,4,4,4,5,6,7,8,8,9,9,9,6,Performance,11,6.1,Large-scale,parallel,testing,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,11,11,6.2,Fluid,dynamics,simulations,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,.,7,Ongoing,developments,in,preconditioning,8,Conclusions,1,Executive,summary,12,15,This,report,focuses,on,the,performance,of,the,elliptic,solver,within,Nektar++,and,its,scaling,in,the,limit,of,large,processor,counts.,Building,on,the,report,2047356-TN-02-2,,which,introduces,a,number,of,sample,test,cases,for,the,elliptic,solver,,we,test,both,the,correctness,and,the,func-,tionality,of,the,elliptic,solver.,This,solver,has,been,used,within,the,anisotropic,diﬀusion,solver,1,provided,under,project,2048465,,and,whose,proxyapp,code,can,be,found,in,the,NEPTUNE,GitHub,repository.,We,discuss,the,formulation,of,an,elliptic,solver,within,a,high-order,ﬁnite,element,setting,,the,mapping,of,these,formulation,onto,performant,software,and,the,precondi-,tioning,strategies,that,may,be,employed,to,solve,such,systems.,Implementation,can,be,found,in:,•,The,solution,of,electrostatic,potential,for,the,Hasegawa-Wakatani,proxyapp:,https:,//github.com/ExCALIBUR-NEPTUNE/nektar-driftwave,•,The,solution,of,the,Laplacian,term,as,part,of,the,anisotropic,heat,proxyapp:,https:,//github.com/ExCALIBUR-NEPTUNE/nektar-diffusion,•,The,main,Nektar++,code,base:,https://gitlab.nektar.info/nektar/nektar,2,Introduction,In,this,report,,we,brieﬂy,summarise,the,discretisation,,performance,and,implementation,of,elliptic,solvers,within,the,Nektar++,framework.,These,examples,will,mostly,focus,on,‘slab’-,type,evaluations,suggested,in,report,2047356-TN-02-2,as,evaluation,cases,for,elliptic,problems,,and/or,problems,found,in,the,computational,ﬂuid,dynamics,that,Nektar++,is,typically,used,within.,However,they,may,equally,well,be,generalised,to,other,problems,,such,as,the,solution,for,electrostatic,potential,within,plasma-based,problems.,As,such,,we,will,consider,elliptic,problems,of,the,type,∇,·,(D∇u),=,f,(1),where,D,is,a,d,×,d,tensor,and,d,is,the,spatial,dimension.,In,the,setting,of,a,high-order,ﬁnite,element,discretisation,,there,are,a,number,of,problem-speciﬁc,challenges,to,overcome,when,considering,a,high-performance,computing,implementation,of,an,elliptic,solver:,•,In,parallel,execution,the,sparse,matrix,representing,the,discretised,operator,is,not,usually,explicitly,constructed.,This,is,because,at,higher-orders,,a,more,computationally-eﬃcient,implementation,strategy,is,typically,to,evaluate,the,operator,on,a,per-element,basis,,and,combine,this,with,an,assembly,operation,to,mimic,the,action,of,the,global,operator,on,a,given,mesh.,Indeed,,the,evaluation,of,the,operator,can,be,done,either,by,constructing,local,elemental,matrices,explicitly,,or,evaluation,of,their,action,via,local,matrix-free,operations,(which,is,the,focus,of,call,2048465).,•,Because,of,this,,the,preconditioning,of,such,problems,becomes,challenging.,Although,it,is,well-known,that,multigrid-type,preconditioning,works,very,eﬀectively,on,elliptic,problems,when,coupled,with,linear,C,0,discretisations,,their,use,in,high-order,simulations,is,not,very,eﬀective.,To,this,end,,in,this,report,we,outline,the,approaches,being,considered,within,Nektar++,to,address,these,challenges:,in,particular,,the,methods,that,are,already,currently,available,,but,also,those,presently,in,development.,The,report,is,therefore,structured,as,follows.,In,section,3,,we,consider,the,formulation,of,such,an,elliptic,problem,in,a,ﬁnite,element,setting.,Section,4,brieﬂy,discusses,the,operator,evaluation,techniques,that,can,be,used,in,this,setting.,Then,in,section,5,we,discuss,some,of,the,preconditioning,options,that,may,be,used,in,this,environment,,followed,by,examination,of,test,cases,in,section,6.,Section,7,outlines,current,work,being,undertaken,on,the,development,of,multigrid,preconditioning,strategies,for,Nektar++.,Finally,,section,8,presents,conclusions,and,other,development,directions.,2,3,Formulation,This,section,gives,a,very,brief,overview,of,the,formulation,of,an,elliptic,equation,in,the,spectral/hp,element,method.,Further,details,can,be,found,in,e.g.,Karniadakis,&,Sherwin,[1].,Consider,the,elliptic,equation,∇,·,(D∇u),+,λu,=,−f,(2),where,D,is,a,d×d,tensor,on,a,domain,Ω,⊂,Rd.,The,starting,point,for,any,ﬁnite,element,solution,is,to,express,the,equation,in,weak,form,by,multiplying,by,a,test,function,v,from,a,suitable,trial,space,V,,considering,functions,u,in,a,test,space,U,,integrating,the,resulting,equality,and,applying,integration,by,parts,to,yield,(D∇u,,∇v)Ω,−,(cid:104)D∇u,·,(cid:126)n,,v(cid:105)∂Ω,+,λ(u,,v)Ω,=,(f,,v),where,the,inner,products,(·,,·),and,(cid:104)·,,·(cid:105),are,deﬁned,as,(u,,v),=,(cid:90),Ω,uv,dx,,(cid:104)u,,v(cid:105),=,(cid:90),∂Ω,uv,ds,(3),(4),and,(cid:126)n,is,the,outwards,facing,normal,from,the,boundary,∂Ω.,In,the,Galerkin,approach,,we,select,U,=,V,,and,then,select,a,ﬁnite-dimensional,subspace,of,U,given,a,mesh,of,N,nonoverlapping,elements,Ωe,such,that,Ω,=,(cid:83)N,In,a,continuous,C,0,high-order,setting,this,space,is,the,piecewise-polynomial,space,e,Ωe.,Dk(Ω),=,{vh,∈,C,0(Ω),|,vh|Ωe,∈,Pk(Ωe)},(5),where,Pk,denotes,a,space,of,polynomial,order,less,than,or,equal,to,k.,The,standard,spectral/hp,approach,to,discretise,(2),starts,with,an,expansion,of,u,and,v,in,terms,of,global,basis,functions,Φn,∈,Dk,,which,are,consequently,represented,as,elemental,basis,functions,φe,n:,uδ(x),=,Ndof,−1,(cid:88),n=0,ˆunΦn(x),=,N,(cid:88),k,(cid:88),e=1,n=1,ˆue,nφe,n(x),(6),n(x),is,the,n-th,local,expansion,mode,within,the,element,Ωe,and,ˆue,where,φe,n,is,the,n-th,local,expansion,coeﬃcient,within,the,element,Ωe.,Approximating,u,and,v,in,this,manner,(and,assuming,homogeneous,Neumann,boundary,conditions,for,simplicity),,we,adopt,a,Galerkin,discretisation,of,equation,(2),where,we,ﬁnd,an,approximate,solution,uδ,∈,Dk(Ω),such,that,(cid:90),Ω,(D∇uδ∇vδ,+,λuδvδ),dx,=,(cid:90),Ω,vδf,dx,∀vδ,∈,Dk(Ω).,This,can,be,formulated,in,matrix,terms,as,Hˆu,=,f,(7),(8),where,H,=,L,+,λM,represents,the,Helmholtz,matrix,,L,is,the,Laplacian,matrix,,λ,is,a,positive,constant,and,M,the,mass,matrix.,ˆu,are,the,unknown,global,coeﬃcients,and,f,the,inner,product,of,the,expansion,basis,with,the,forcing,function.,The,solution,of,the,discrete,elliptic,problem,is,therefore,equivalent,to,the,solution,of,the,matrix,problem.,3,4,Implementation,strategies,&,operator,evaluation,In,this,section,we,brieﬂy,outline,the,implementation,strategies,that,may,be,employed,in,the,solution,of,the,elliptic,matrix,equation,(8),,with,an,emphasis,on,parallel,execution.,In,this,setting,,we,expect,that,the,mesh,is,partitioned,as,a,preprocessing,step,into,R,subdomains,(weighted,appropriately,by,the,number,of,degrees,of,freedom,,so,as,to,load-balance,the,prob-,lem).,Furthermore,,the,solution,of,the,matrix,equation,is,given,through,an,appropriate,Krylov,solver,such,as,the,conjugate,gradient,method.,The,dominant,cost,in,this,process,is,therefore,the,computation,of,the,action,of,H,,at,the,n-th,iteration,of,such,methods,,i.e.,Hˆun.,The,subsections,below,outline,the,general,approaches,that,may,be,taken,in,a,high-order,setting.,4.1,Global,matrix,approach,Mathematically,the,easiest,approach,is,to,assemble,the,global,system,H.,To,do,this,,we,ﬁrst,ignore,the,mesh,as,a,whole,and,construct,instead,a,local,elemental,matrix,He,by,considering,the,left,hand,integral,of,equation,(7),on,a,single,element,Ωe.,Then,,with,knowledge,of,the,mesh,topology,and,the,C,0,connectivity,between,neighbouring,elements,(and,therefore,their,basis,functions),,we,form,an,assembly,matrix,A,which,maps,the,vector,of,global,coeﬃcients,ˆu,to,a,vector,of,local,coeﬃcients,ˆul.,In,this,manner,,we,may,express,the,global,matrix,as,H,=,A(cid:62),(cid:33),He,A.,(cid:32),N,(cid:77),e=1,(9),In,a,parallel,environment,,the,sub-matrix,of,H,is,constructed,for,each,rank,,and,then,the,action,of,the,gather-scatter,operations,applied,by,A(cid:62),and,A,are,implemented,via,the,external,third-party,library,gslib,[2]1.,From,a,practical,perspective,,the,construction,of,H,as,a,global,sparse,matrix,is,typically,not,recommended,at,any,polynomial,order,k,>,1,,since,the,matrix,itself,admits,a,rich,structure,based,on,the,locally-dense,high-order,element,matrices,which,is,disregarded,in,this,setting.,Attaining,suﬃcient,arithmetic,intensity,in,operator,evaluation,is,also,highly,challenging,due,to,the,large,amount,of,indirection,incurred,via,the,use,of,standard,sparse,matrix,data,structures,(e.g.,compressed,row/column,format).,4.2,Local,matrices,Instead,,we,may,elect,to,consider,that,the,operation,is,separated,into,three,steps,to,obtain,v,=,Hu:,1.,obtain,the,vector,of,local,coeﬃcients,from,the,scatter,operation,ˆul,=,Aˆu;,2.,compute,the,matrix-vector,product,(cid:32),N,(cid:77),(cid:33),He,ul,ˆvl,=,e=1,(10),3.,gather,the,local,coeﬃcients,as,ˆv,=,A(cid:62)ˆvl.,Mathematically,this,process,is,identicial,in,nature,to,eq.,(9).,However,step,2,comes,with,the,distinct,advantage,that,the,matrix-vector,product,is,block,diagonal:,therefore,we,may,precompute,element,matrices,He,as,small,dense,matrices,,which,consequently,greatly,increases,1https://github.com/Nek5000/gslib,4,the,arithmetic,intensity,of,the,method,(even,owing,for,the,gather/scatter,operations),,despite,the,complexity,remaining,unchanged.,This,approach,has,been,the,standard,approach,employed,within,many,high-order,codes.,How-,ever,,for,large,k,in,three-dimensions,,even,the,local,matrices,start,to,increase,greatly,in,size.,We,therefore,should,consider,strategies,for,reducing,the,dimensionality,or,complexity,of,the,problem,in,some,manner.,This,is,the,consideration,in,the,following,two,sections.,4.3,Static,condensation,The,spectral/hp,expansion,basis,is,obtained,by,considering,interior,modes,,which,have,support,only,in,the,interior,of,the,element,,separately,from,boundary,modes,which,are,non-zero,on,the,boundary,of,the,element.,We,align,the,boundary,modes,across,the,interface,of,the,elements,to,obtain,a,continuous,global,solution.,For,a,three-dimensional,problem,,the,boundary,modes,can,be,further,decomposed,into,vertex,,edge,and,face,modes,,deﬁned,as,follows:,•,vertex,modes,have,support,on,a,single,vertex,and,the,three,adjacent,edges,and,faces,as,well,as,the,interior,of,the,element;,•,edge,modes,have,support,on,a,single,edge,and,two,adjacent,faces,as,well,as,the,interior,of,the,element;,•,face,modes,have,support,on,a,single,face,and,the,interior,of,the,element.,It,is,worth,emphasising,that,only,certain,choices,of,basis,functions,support,such,decomposi-,tions.,In,the,modiﬁed,basis,of,Karniadakis,&,Sherwin,,such,a,structure,is,readily,admitted,for,all,elemental,shape,types.,The,classical,Lagrange,interpolant,basis,also,admits,the,same,structure,for,quadrilateral,and,hexahedral,elements.,However,,other,orthogonal,bases,such,as,the,Legendre,basis,for,triangular,elements,do,not,support,this,structure,(which,also,complicates,the,imposition,of,C,0,conditions,between,elements).,When,the,discretisation,is,continuous,,this,strong,coupling,between,vertices,,edges,and,faces,leads,to,a,matrix,of,high,condition,number,κ.,Our,aim,is,to,reduce,this,condition,number,by,applying,specialised,preconditioners.,However,,we,can,also,leverage,this,structure,to,reduce,the,dimension,of,the,parallel,problem,solved,across,the,entire,grid,via,static,condensation.,Utilising,the,above,mentioned,decomposition,,we,can,write,the,matrix,equation,as,(cid:20),Hbb,Hbi,Hib,Hii,(cid:21),(cid:20),ˆub,ˆui,(cid:21),=,(cid:20),ˆfb,ˆfi,(cid:21),(11),where,the,subscripts,b,and,i,denote,the,boundary,and,interior,degrees,of,freedom,respectively.,This,system,then,can,be,statically,condensed,allowing,us,to,solve,for,the,boundary,and,interior,degrees,of,freedom,in,a,decoupled,manner.,The,statically,condensed,matrix,is,given,by,(cid:20),Hbb,−,HbiH−1,ii,Hib,Hib,0,Hii,(cid:21),(cid:20),ˆub,ˆui,(cid:21),=,(cid:20),ˆfb,−,HbiH−1,ˆfi,ˆfi,ii,(cid:21),.,(12),This,is,highly,advantageous,since,by,deﬁnition,of,our,interior,expansion,this,vanishes,on,the,boundary,,and,so,Hii,is,block,diagonal,and,thus,can,be,trivially,inverted,as,a,preprocessing,step.,The,above,sub-structuring,has,reduced,our,problem,to,solving,the,boundary,problem,S1,ˆu,=,ˆf1,(13),ˆfi.,Although,this,new,system,typically,where,S1,=,Hbb,−,HbiH−1,has,better,convergence,properties,(i.e,lower,κ),,the,system,is,still,ill-conditioned,,leading,to,ii,Hib,and,ˆf1,=,ˆfb,−,HbiH−1,ii,5,a,convergence,rate,of,iterative-type,solvers,such,as,the,conjugate,gradient,method,that,is,prohibitively,slow.,For,this,reason,we,need,to,precondition,S1.,To,do,this,we,solve,an,equivalent,system,of,the,form:,M−1,(cid:16),S1,ˆu,−ˆf1,(cid:17),=,0,(14),where,the,preconditioning,matrix,M,is,such,that,κ,(M−1S1),is,less,than,κ,(S1),and,speeds,up,the,convergence,rate.,Within,the,conjugate,gradient,routine,the,same,preconditioner,M,is,applied,to,the,residual,vector,ˆrk+1,of,the,conjugate,gradient,routine,every,iteration,ˆzk+1,=,M−1ˆrk+1.,(15),We,discuss,speciﬁc,preconditioning,strategies,in,section,5.,4.4,Matrix-free,operations,A,ﬁnal,option,is,to,omit,the,construction,of,local,matrices,entirely,and,instead,evaluate,their,action,via,their,summation,forms,instead.,This,is,called,the,matrix-free,approach,and,helps,to,further,increase,arithmetic,intensity,since,matrix,construction,and,storage,is,no,longer,required,[3].,To,demonstrate,this,approach,,consider,a,straightforward,quadrilateral,element,and,further,assume,that,D,is,the,identity,for,simplicity.,On,this,element,type,we,use,an,expansion,basis,of,the,form,{φpq(ξ1,,ξ2);,p,=,1,,.,.,.,,,P,;,q,=,1,,.,.,.,,,P,}.,The,approximated,function,is,then,given,by,the,summation,u(ξ),=,P,(cid:88),P,(cid:88),p=1,q=1,ˆupqφpq(ξ1,,ξ2),and,the,two-dimensional,discrete,Helmholtz,operator,now,reads,(cid:88),(cid:88),wiwjφpq(ξ1i,,ξ2j)φrs(ξ1i,,ξ2j)|J,e,ij|,H,e,δ,[m][n],=,λ,(cid:88),+,j,i,(cid:88),wiwj(J,e,ij)−T,∇φpq(ξ1i,,ξ2j),·,(J,e,ij)−1∇φrs(ξ1i,,ξ2j)|J,e,ij|.,(16),(17),i,j,where,the,indices,i,and,j,again,span,over,the,quadrature,points,,and,the,indices,m,and,n,are,obtained,as,linear,combinations,of,the,indices,i,,j,,p,,q,,r,,and,s,that,depend,on,the,type,of,element,used,on,the,approximation.,For,speciﬁc,details,of,this,calculation,the,interested,reader,could,consult,Section,4.1.5,of,the,textbook,[1].,The,discrete,Helmholtz,operator,can,be,further,split,into,the,mass,operator,M,e,Laplacian,operator,Le,δ,yielding,δ,and,the,δ,+,Le,δ.,The,mass,operator,can,be,rewritten,in,matrix,form,as,δ,=,λM,e,H,e,(18),(19),where,the,dense,basis,matrix,B,contains,terms,of,the,form,φpq(ξ1i,,ξ2j),,and,the,components,of,the,diagonal,matrix,W,are,the,weights,wiwj|J,e,ij|.,Noting,that,the,derivative,of,the,basis,functions,with,respect,to,the,local,coordinates,for,two-,dimensional,expansions,is,,,,,,(20),M,e,δ,=,BT,W,B,,,,∂φ,∂ξ1i,∂φ,∂ξ2j,∇φ(ξ1i,,ξ2j),=,,,,6,the,inverse,of,the,Jacobian,matrix,(J,e)−1,is,(J,e)−1,=,,,,,∂ξ1,∂x1,∂ξ1,∂x2,,,,,,,∂ξ2,∂x1,∂ξ2,∂x2,and,the,Laplacian,operator,can,be,expanded,as,Le,δ,=,(cid:88),(cid:88),i,j,(cid:34)(cid:18),∂ξ1,∂x1,(cid:18),∂ξ1,∂x2,+,wiwj,∂φpq,∂ξ1,∂φpq,∂ξ1,+,+,∂ξ2,∂x1,∂φpq,∂ξ2,∂ξ2,∂x2,∂φpq,∂ξ2,(cid:19)(cid:18),∂ξ1,∂x1,(cid:19)(cid:18),∂ξ1,∂x2,∂φrs,∂ξ1,∂φrs,∂ξ1,+,+,(cid:19),∂ξ2,∂x1,∂φrs,∂ξ2,(cid:19)(cid:35),∂ξ2,∂x2,∂φrs,∂ξ2,|J,e,ij|,.,ij,We,can,now,rewrite,the,Laplacian,operator,in,matrix,form,as,Le,δ,=,U,T,W,U,+,V,T,W,V,,,with,U,and,V,being,(cid:18),U,=,Λ,(cid:18),V,=,Λ,(cid:19),(cid:19),(cid:18),∂ξ1,∂x1,(cid:18),∂ξ1,∂x2,Dξ1,+,Λ,Dξ1,+,Λ,(cid:19),(cid:19),(cid:18),∂ξ2,∂x1,(cid:18),∂ξ2,∂x2,(cid:19),Dξ2,B,(cid:19),Dξ2,B,.,(21),(22),(23),(24),(25),Here,Λ(f,),is,is,a,diagonal,matrix,containing,terms,of,the,form,f,(ξ1i,,ξ2j),,and,Dξi,is,a,block-,diagonal,diﬀerentiation,matrix,with,entries,dhp(ξ1),where,hp(ξ),is,the,one-dimensional,La-,grange,polynomial.,(cid:12),(cid:12),(cid:12)ξ1i,dξ1,In,this,form,,we,have,written,the,Helmholtz,matrix,mostly,in,terms,of,element-independent,matrices:,the,Dξ,and,quadrature,locations,and,weights,are,entirely,deﬁned,on,the,standard,element.,The,only,elemental-speciﬁc,information,required,is,therefore,the,geometric,factors,and,their,associated,Jacobian.,When,combined,further,with,sum-factorisation,to,reduce,the,cost,of,the,summation,from,a,complexity,of,O(P,2d),to,O(P,d+1),,matrix-free,evaluation,gives,performant,and,vectorisation-ameanable,approach,to,operator,evaluation,(as,shown,in,[3]).,4.5,Solvers,in,Nektar++,Nektar++,exposes,some,of,these,approaches,within,the,GlobalSysSoln,classes,,which,imple-,ment,these,strategies:,•,Full,solves,operate,on,the,full,operator;,•,StaticCond,solves,operate,on,the,statically-condensed,solver;,•,MultiLevelStaticCond,solves,apply,the,static,condensation,approach,repeatedly,to,fur-,ther,reduce,the,system,size.,The,precise,evaluation,strategy,depends,on,the,actual,solver,being,used,to,solve,the,resulting,systems:,7,•,Direct,solves,use,LAPACK,to,solve,the,matrix,system,,and,can,be,used,in,serial,only;,•,Iterative,solves,use,a,conjugate,gradient,(symmetric),or,GMRES,method,(non-symmetric),to,solve,the,system,,either,in,serial,or,parallel;,•,Xxt,uses,the,XX,(cid:62),direct,parallel,solver,[4],to,solve,the,system;,•,PETSc,uses,the,PETSc,linear,algebra,package,for,the,system.,The,combination,of,these,strategies,results,in,a,concrete,implementation.,For,example:,•,DirectFull,solves,the,full,matrix,system,with,LAPACK.,•,IterativeFull,does,the,same,,but,using,CG/GMRES.,In,this,case,the,matrix,evaluation,would,be,done,via,a,matrix-free,approach.,•,IterativeStaticCond,solves,the,statically-condensed,system,using,an,iterative,approach.,This,is,the,default,method,in,parallel.,•,DirectMultiLevelStaticCond,solves,the,multi-level,statically,condensed,system,using,a,direct,solver.,This,is,the,default,method,in,serial,execution.,5,Preconditioning,Within,the,Nektar++,framework,a,number,of,preconditioners,are,available,to,speed,up,the,convergence,rate,of,the,conjugate,gradient,routine.,The,table,below,summarises,each,method,,the,dimensions,of,elements,which,are,supported,,and,also,the,discretisation,type,support,which,can,either,be,continuous,(CG),or,discontinuous,(hybridizable,DG).,Name,Null,Diagonal,FullLinearSpace,LowEnergyBlock,Block,FullLinearSpaceWithDiagonal,FullLinearSpaceWithLowEnergyBlock,2/3D,2/3D,FullLinearSpaceWithBlock,All,Dimensions,Discretisations,All,All,2/3D,3D,2/3D,All,All,CG,CG,(static,cond.),All,CG,CG,(static,cond.),CG,The,default,is,the,Diagonal,preconditioner.,The,above,preconditioners,are,speciﬁed,through,the,Preconditioner,option,of,the,SOLVERINFO,section,in,the,session,ﬁle.,For,example,,to,enable,FullLinearSpace,one,can,use:,<I,PROPERTY="Preconditioner",VALUE="FullLinearSpace",/>,Alternatively,one,can,have,more,control,over,diﬀerent,preconditioners,for,each,solution,ﬁeld,by,using,the,GlobalSysSoln,section,of,the,XML,ﬁle,(for,more,details,,consult,the,user,guide).,The,following,sections,specify,the,details,for,each,method.,5.1,Diagonal,Diagonal,(or,Jacobi),preconditioning,is,amongst,the,simplest,preconditioning,strategies.,In,this,scheme,one,takes,the,global,matrix,H,=,(hij),and,computes,the,diagonal,terms,hii.,The,preconditioner,is,then,formed,as,a,diagonal,matrix,M−1,=,(h−1,ii,).,8,5.2,Linear,space,The,linear,space,(or,coarse,space),of,the,matrix,system,is,that,containing,degrees,of,freedom,corresponding,only,to,the,vertex,modes,in,the,high-order,system.,Preconditioning,of,this,space,is,achieved,by,forming,the,matrix,corresponding,to,the,coarse,space,and,inverting,it,,so,that,M−1,=,(S−1,1,)vv.,(26),Since,the,mesh,associated,with,higher,order,methods,is,relatively,coarse,compared,with,tradi-,tional,ﬁnite,element,discretisations,,the,linear,space,can,usually,be,directly,inverted,without,memory,issues.,However,such,a,methodology,can,be,prohibitive,on,large,parallel,systems,,due,to,a,bottleneck,in,communication.,In,Nektar++,the,inversion,of,the,linear,space,present,is,handled,using,the,XX,T,library,[4].,XX,T,is,a,parallel,direct,solver,for,problems,of,the,form,Aˆx,=,ˆb,based,around,a,sparse,factorisation,of,the,inverse,of,A.,To,precondition,utilising,this,methodology,the,linear,sub-,space,is,gathered,from,the,expansion,and,the,preconditioned,residual,within,the,CG,routine,is,determined,by,solving,(S1)vvˆz,=,ˆr.,The,preconditioned,residual,ˆz,is,then,scattered,back,to,the,respective,location,in,the,global,degrees,of,freedom.,(27),5.3,Block,Block,preconditioning,of,the,C,0,continuous,system,is,deﬁned,by,M−1,=,,,(S−1,1,)vv,0,0,0,(S−1,1,)eb,0,0,0,(S−1,1,)ef,,,(28),where,diag[(S1)vv],is,the,diagonal,of,the,vertex,modes,,(S1)eb,and,(S1)f,b,are,block,diagonal,matrices,corresponding,to,coupling,of,an,edge,(or,face),with,itself,i.e,ignoring,the,coupling,to,other,edges,and,faces.,This,preconditioner,is,best,suited,for,two,dimensional,problems.,5.4,Low,energy,preconditioner,Low,energy,basis,preconditioning,follows,the,methodology,proposed,by,Sherwin,&,Casarin.,In,this,method,a,new,basis,is,numerically,constructed,from,the,original,basis,which,allows,the,Schur,complement,matrix,to,be,preconditioned,using,a,block,preconditioner.,The,method,is,outlined,brieﬂy,in,the,following.,Elementally,the,local,approximation,uδ,can,be,expressed,as,diﬀerent,expansions,lying,in,the,same,discrete,space,V,δ,uδ(x),=,dim(V,δ),(cid:88),i,ˆu1iφ1i(x),=,dim(V,δ),(cid:88),i,ˆu2iφ2j(x).,(29),Since,both,expansions,lie,in,the,same,space,it,is,possible,to,express,one,basis,in,terms,of,the,other,via,a,transformation,,i.e.,Applying,this,to,the,Helmholtz,operator,it,is,possible,to,show,that,φ2,=,Cφ1,=⇒,ˆu1,=,C,T,ˆu2.,H2,=,CH1CT,.,9,(30),(31),For,sub-structured,matrices,(S),the,transformation,matrix,(C),becomes,C,=,(cid:20),R,0,I,0,(cid:21),.,Hence,the,transformation,in,terms,of,the,Schur,complement,matrices,is:,S2,=,RS1RT,.,(32),(33),Typically,the,choice,of,expansion,basis,φ1,can,lead,to,a,Helmholtz,matrix,that,has,undesirable,properties,i.e,poor,condition,number.,By,choosing,a,suitable,transformation,matrix,C,it,is,pos-,sible,to,construct,a,new,basis,,numerically,,that,is,amenable,to,block,diagonal,preconditioning,,given,by,,,Svv,Sve,Svf,ST,ve,See,Sef,vf,ST,ST,ef,Sf,f,Applying,the,transformation,S2,=,RS1RT,leads,to,the,following,matrix,Sv,ef,v,ef,Sef,ef,(cid:20),Svv,ST,S1,=,,=,,(cid:21),.,S2,=,(cid:20),Svv,+,RvST,v,ef,+,Sv,ef,RT,A[ST,v,ef,+,Sef,ef,RT,v,],v,+,RvSef,ef,RT,v,[Sv,ef,+,RvSef,ef,]AT,ASef,ef,AT,(cid:21),where,ASef,ef,AT,is,given,by,(34),(35),ASef,ef,AT,=,(cid:20),See,+,Ref,ST,ef,+,Sef,RT,ef,+,Sf,f,RT,ST,ef,ef,+,Ref,Sf,f,RT,ef,Sef,+,Ref,Sf,f,(cid:21),.,(36),Sf,f,To,orthogonalise,the,vertex-edge,and,vertex-face,modes,,it,can,be,seen,from,the,above,that,RT,ef,=,−S−1,f,f,ST,ef,(37),and,for,the,edge-face,modes,RT,v,ef,.,ef,ef,ST,v,=,−S−1,Here,it,is,important,to,consider,the,form,of,the,expansion,basis,since,the,presence,of,S−1,f,f,will,lead,to,a,new,basis,which,has,support,on,all,other,faces;,this,is,problematic,when,creating,a,C,0,continuous,global,basis.,To,circumvent,this,problem,when,forming,the,new,basis,,the,decoupling,is,only,performed,between,a,speciﬁc,edge,and,the,two,adjacent,faces,in,a,symmetric,standard,region.,Since,the,decoupling,is,performed,in,a,rotationally,symmetric,standard,region,the,basis,does,not,take,into,account,the,Jacobian,mapping,between,the,local,element,and,global,coordinates,,hence,the,ﬁnal,expansion,will,not,be,completely,orthogonal.,(38),The,low,energy,basis,creates,a,Schur,complement,matrix,that,although,it,is,not,completely,orthogonal,can,be,spectrally,approximated,by,its,block,diagonal,contribution.,The,ﬁnal,form,of,the,preconditioner,is:,,,M−1,=,diag[(S2)vv],0,0,0,(S2)eb,0,0,0,(S2)f,b,,−1,,(39),where,diag[(S2)vv],is,the,diagonal,of,the,vertex,modes,,(S2)eb,and,(S2)f,b,are,block,diagonal,matrices,corresponding,to,coupling,of,an,edge,(or,face),with,itself,i.e.,ignoring,the,coupling,to,other,edges,and,faces.,10,Figure,1:,Strong,scaling,of,the,preconditioned,conjugate,gradient,solver,,run,on,a,mesh,of,a,Formula,1,front,wing,section,(pictured,right),,consisting,of,3.3M,tetrahedral,and,prismatic,elements.,Speedup,is,calculated,relative,to,runs,on,8192,processors.,6,Performance,In,this,section,we,brieﬂy,outline,performance,of,the,elliptic,solver,from,the,perspective,of,parallel,eﬃciency,for,the,IterativeStaticCond,solver.,We,note,that,other,NEPTUNE,deliverables,(particularly,those,from,project,2048465),focus,on,the,outline,performance,properties,of,matrix-,free,operators.,Nektar++,has,been,used,on,a,wide,variety,of,HPC,platforms.,We,report,on,two,of,the,largest,simulations,to,date:,•,A,diﬀusion,problem,over,a,complex,geometry,of,a,Formula,1,car,on,the,Mira,supercom-,puter,,which,shows,general,speedup,of,the,elliptic,solver,under,simple,preconditioning,strategies.,•,A,more,realistic,ﬂuid,dynamics,simulation,of,an,Elemental,RP1,car,on,the,supercomputer,ARCHER2.,6.1,Large-scale,parallel,testing,As,part,of,validation,of,the,solver,at,large,scales,,simulations,were,performed,on,both,the,BG/Q,at,ANL,and,the,Cray,XK7,at,ORNL,for,preliminary,testing,of,Nektar++,on,extremely,large,core,counts.,In,Fig.,1,,we,present,the,results,of,strong,scaling,simulations,up,to,131K,processors,on,the,BG/Q,,where,we,measure,relative,speedup,when,compared,to,a,simulation,on,8192,processors.,The,starting,point,for,these,tests,is,a,complex,topology,of,a,Formula,1,front,wing,section,,which,is,6th,order,accurate,in,space.,We,then,measure,the,mean,execution,time,for,a,single,iteration,of,the,conjugate,gradient,solver,,which,represents,the,most,processor-,and,communication-intensive,part,of,the,solver.,At,131K,processors,,there,are,only,25,elements,per,process.,This,of,course,is,not,a,representative,example,of,a,full,complex,ﬂuid,dynamics,solver,,since,the,setup,is,deliberately,simple:,a,diﬀusion,equation,with,simple,Jacobi,preconditioning.,However,it,demonstrates,that,the,core,solve,itself,is,readily,scalable,to,high,core,counts.,6.2,Fluid,dynamics,simulations,Results,are,shown,in,Figure,2,for,a,strong,scaling,test,using,early,access,to,the,ARCHER2,facility,,using,between,2k,and,50k,cores.,The,test,case,under,investigation,is,a,ﬁfth-order,incompressible,ﬂuid,dynamics,simulation,of,a,track,car,conﬁguration,at,Reynolds,number,of,11,020000400006000080000100000120000140000Nproc0246810121416SpeedupMira,BG/QIdeal,Figure,2:,Strong,scaling,of,the,Elemental,RP1,track,car,simulation,on,the,ARCHER2,facility.,approximately,1,million,,on,a,mesh,of,mixed,tetrahedral/prismatic,elements,,so,that,at,50k,cores,we,observe,around,70,elements,per,computing,core.,This,therefore,represents,a,mesh,of,around,double,the,density,of,the,test,case,above.,Preconditioning,in,this,case,uses,the,low-energy,block,preconditioner,for,all,simulation,variables.,At,lower,core,counts,,relatively,high,memory,usage,per,node,for,this,more,realistic,case,means,that,superlinear,scaling,is,observed.,However,beyond,around,20,000,cores,,memory,per,node,drops,suﬃciently,so,that,cache,usage,provides,more,eﬀective,performance,,thereby,bringing,scaling,into,a,more,linear,regime.,We,note,that,these,results,were,obtained,very,shortly,after,early,access,was,made,available,for,the,4-cabinet,system,in,November,2021,,and,so,may,diﬀer,from,results,available,now.,However,it,also,highlights,the,general,performance,capability,of,the,code,for,a,more,realistic,problem.,7,Ongoing,developments,in,preconditioning,In,the,test,cases,discussed,above,,one,potential,limitation,to,scalability,is,the,preconditioning,strategies,being,used,,particularly,when,considering,Poisson,problems,of,the,form,∇2p,=,−f,(40),in,solving,for,pressure,p,or,,in,the,case,of,plasma-related,applications,,electrostatic,potential,φ.,It,is,well-known,that,multigrid,methods,are,preferred,method,for,solving,the,resulting,symmet-,ric,linear,systems,with,optimal,and,mesh-independent,convergence.,However,,when,used,with,unstructured,meshes,,the,use,of,geometric,multigrid,(GMG),is,not,straightforward,,requiring,the,use,of,algebraic,multigrid,(AMG),instead.,Unlike,GMG,,where,the,coarse-grid,operators,are,typically,obtained,via,geometric,coarsening,and,rediscretizations,,AMG,does,not,use,the,geometric,information,but,instead,builds,the,coarse,grid,operators,algebraically.,Algebraic,coarsening,results,in,a,loss,of,sparsity,,leading,to,poor,performance,and,scalability,,especially,for,large-scale,problems.,This,is,usually,addressed,by,employing,sparsity,control,techniques,,such,as,dropping,values,below,a,speciﬁed,threshold,at,coarser,levels.,Such,measures,can,address,the,sparsity,issues,,albeit,with,some,loss,of,convergence,rates.,Additionally,,they,introduce,an,additional,parameter,that,needs,to,be,tuned,,making,the,overall,solver,or,preconditioner,less,12,0102030,h,,p,=,4,h,,p,=,2,h,,p,=,1,2h,,p,=,1,coarsen,coarsen,coarsen,Figure,3:,Illustration,of,the,coarsening,strategy,of,GIAMG,for,quadrilateral,elements,with,nodal,basis,of,polynomial,order,p,=,4.,First,,We,p-coarsen,from,p,=,4,to,p,=,2,,and,then,again,p-coarsen,from,p,=,2,to,p,=,1.,The,total,number,of,elements—and,therefore,the,mesh—does,not,change,during,p-coarsening.,Once,we,reach,linear,elements,,we,perform,h-coarsening,to,obtain,a,coarser,mesh,and,smaller,system.,robust.,This,issue,is,exacerbated,for,high-order,operators,that,are,denser,to,begin,with,,and,get,even,denser,on,coarser,grids,,leading,to,poor,scalability.,In,some,sense,the,linear,space,preconditioner,outlined,in,section,5,can,be,viewed,as,a,form,of,multigrid,,where,in,a,single-level,V-cycle,the,restriction,operator,is,the,extraction,of,linear,modes,and,no,smoothing,is,performed.,However,the,reliance,on,XX,(cid:62),imposes,natural,scala-,bility,restrictions,so,that,beyond,104,cores,,and,the,lack,of,smoothing,does,not,guarantee,many,of,the,convergence,properties,that,a,multigrid,method,should,theoretically,provide.,In,work,currently,being,developed2,,to,circumvent,this,problem,we,are,considering,a,geometri-,cally,informed,algebraic,multigrid,approach,(GIAMG),,in,which,we,perform,two,phases:,•,p-coarsening:,in,which,the,problem,is,coarsened,by,reducing,the,polynomial,order,of,the,basis,functions,and,degrees,of,freedom,within,each,element,,resulting,in,a,local,coarsening;,followed,by,•,algebraic,h-coarsening:,also,known,as,algebraic,multigrid,(AMG),,in,which,the,mesh,is,coarsened,based,on,the,speciﬁc,matrix,connectivity,properties.,This,is,visualised,in,ﬁgure,3.,We,note,that,for,the,grids,obtained,using,p-coarsening,,we,eﬀectively,increase,the,sparsity,as,we,go,to,coarser,levels.,This,enables,us,to,oﬀset,the,loss,of,sparsity,encountered,when,generating,coarser,levels,using,AMG.,This,plays,a,major,part,In,addition,,if,the,high-order,system,in,ensuring,that,the,overall,scalability,remains,good.,is,generated,using,a,modal,basis,,then,the,restriction,operation,is,simply,an,injection,,i.e.,,a,lower,order,mode,is,injected,to,the,coarser,level.,This,indicates,the,prolongation,and,restriction,operators,will,only,have,zero-,or,single-valued,entries.,As,a,result,,it,is,extremely,eﬃcient,to,perform,p-coarsening,for,such,high-order,systems.,Note,when,using,modal,basis,,the,sparsity,of,coarser,level,matrices,is,further,reduced,due,to,the,injection,during,interpolation,,compared,to,using,a,nodal,basis.,To,test,this,approach,,we,consider,the,sample,pin-type,breeder,blanket,conﬁguration,used,for,reactor,cooling,,shown,in,ﬁgure,4.,We,focus,on,an,under-resolved,DNS,at,Re,=,104,with,a,coarse,mesh,consisting,of,around,150000,tetrahedral,and,prismatic,elements.,The,coarse,nature,of,these,simulations,,together,with,large,changes,in,velocity,across,a,small,region,at,the,end,of,the,pin,,together,with,high,aspect,ratio,elements,at,the,walls,,makes,the,systems,particularly,challenging,to,precondition,at,high,order.,We,plot,the,convergence,results,using,GIAMG,for,p,from,1,to,5,as,shown,in,Figure,4.,For,this,2A,geometrically,informed,algebraic,multigrid,preconditioned,iterative,approach,for,solving,high-order,ﬁnite,element,systems.,S.,Xu,,M.,Rasouli,,R.,M.,Kirby,,D.,Moxey,and,H.,Sundar,,under,review,in,Numerical,Linear,Algebra,with,Applications.,13,Figure,4:,Geometry,and,convergence,properties,for,the,sample,case.,Figure,5:,Strong,scaling,of,the,GIAMG,approach.,more,complex,operator,,we,set,the,convergence,criterion,to,be,the,relative,residual,less,than,10−6.,In,this,case,,we,only,reduce,the,basis,order,by,1,at,each,level,during,p-coarsening,followed,by,6,additional,levels,of,coarsening,using,smoothed,aggregation.,We,perform,2,Chebyshev,iterations,in,pre-,and,postsmoothing.,The,sizes,and,numbers,of,nonzeros,of,the,coarsest,level,matrices,in,these,cases,are,slightly,diﬀerent,,varying,from,303,×,303,to,315,×,315,,and,32338,to,34191,,respectively,,which,are,all,reasonably,small,at,the,coarsest,level,for,the,superLU,solver.,We,can,see,the,number,of,iterations,does,not,signiﬁcantly,increase,as,we,increase,p,order.,The,convergence,of,p,=,3,case,looks,slightly,inferior,to,other,cases.,However,,it,still,behaves,within,a,reasonable,range,,and,the,convergence,pattern,is,similar,to,that,in,other,cases.,When,compared,against,the,standard,Jacobi-type,preconditioning,which,is,extremely,cheap,and,communication-free,to,evaluate,,the,GIAMG,approach,is,clearly,more,expensive,per,con-,jugate,gradient,iteration.,However,,the,GIAMG,signiﬁcantly,reduces,the,number,of,iterations,required,to,converge,from,roughly,3500,iterations,for,Jacobi-preconditioned,solves,to,around,80-90,iterations.,Scaling,of,the,V-cycle,from,between,16,to,128,nodes,(with,56,cores,per,node,,so,7168,MPI,ranks,at,the,larger,end),is,shown,in,Figure,5,,which,demonstrates,the,potential,for,this,method,for,larger-scale,problems.,It,should,be,noted,that,this,approach,is,not,yet,widely,available,with,Nektar++,but,we,are,working,to,integrate,this,over,the,coming,months,to,improve,elliptic,solver,performance,in,a,wide,range,of,problems.,There,are,also,challenges,to,be,overcome,in,terms,of,greater,perfor-,mance,of,the,solver,in,parallel,execution,,the,reduction,of,setup,times,in,construction,of,the,matrix,systems,,as,well,as,other,potential,improvements,such,as,the,use,of,matrix-free,methods,within,the,V-cycle.,14,Figure4:Geometryandmeshofthecomputationaldomain.assemblethesystemsarethesolutionsaftersolvingthegoverningequationsforsometimestepsandreachingaphysically-meaningfulstate.Weonlysolveforonetimestepinourexperimentssincethematricesarenotchangingwithtime.Consideringthatveryhighorderofbasisfunctionistypicallynotfavoredinpracticeforsuchacomplexengineeringapplication,wechoosearea-sonablyhighorderofp=5asourmaximumbasisorderinthisexample.Theﬁnestlevelmatrixsizeandnumberofnonzerosare6,604,956⇥6,604,956and1,129,952,608,respectively.Alltheexperiments(exceptthescalingexperiment)areperformedusing3nodeswith40coresoneachnode,resultinginatotal120MPItasksusingUniversityofUtahCHPCcomputingresources.3.2.1.ConvergenceresultsWeplottheconvergenceresultsusingGIAMGforpfrom1to5asshowninFigure5.Forthismorecomplexoperator,wesettheconvergencecriteriontobetherelativeresiduallessthan10 6.Inthiscase,weonlyreducethebasisorderby1ateachlevelduringp-coarseningfollowedby6additionallevelsofcoarseningusingsmoothedaggregation.Weperform2Chebysheviterationsinpre-andpostsmoothing.Thesizesandnumbersofnonzerosofthecoarsestlevelmatricesinthesecasesareslightlydi↵erent,varyingfrom303⇥303to315⇥315,and32338to34191,respectively,whichareallreasonablysmallatthecoarsestlevelforthesuperLUsolver.Wecanseethenumberofiterationsdoesnotsigniﬁcantlyincreaseasweincreaseporder.Theconvergenceofp=3caselooksslightlyinferiortoothercases.However,itstillbehaveswithinareasonablerange,andtheconvergencepatternissimilartothatinothercases.3.2.2.Detailedanalysisatp=5Wepresentadetailedanalysisatp=5tocomprehensivelyevaluatetheperformanceofourGIAMGsolver.Allthetimingpresentedinthissectionistheaverageacrossallprocesses.6levelsofSAcoarseningareused(asintheprevioussection)inallexperimentsexceptforthecomparisonwithothermultigridpackages,whichisdetailedin§3.2.2.5.Thesamemachineisemployed(asin1502040608010010 610 410 2100NumberofiterationsRelativeresidualp=1p=2p=3p=4p=5Figure5:Convergenceresultsforpfrom1to5fortheNavier–Stokesoperator.theprevioussectiontoallowinter-ordercomparisons)exceptforthestrongscalingexperiments,whicharepresentedin§3.2.2.4.3.2.2.1BreakdownproﬁlingWereportbreakdownproﬁlingatp=5inFigure6.Wereportthetimingsfordi↵erentkeypartsofthev-cycleandthePCGsolver.First,withinthev-cycle,thepre-andpostsmoothing(2smoothingiterations)dominatethetotalv-cycletime,withthesmoothingattheﬁrst(ﬁnest)levelbeingthemostexpensiveofthetotalsmoothingtime.Itisimportanttonotethatthiswillnotbetrueifwedonotemployp-coarsening,asthecoarserlevelswillbedenserandmoreexpensivetoevaluate.Intergridtransfer(residualrestrictionanderrorprolongation)onlycostsasmallamountoftime.Note,thesuperLUdirectsolvercostsanegligibleamountoftime(indicatingasmallenoughcoarsestlevelsystem),andthereforeitisnotshowninFigure6.Inaddition,thev-cyclepreconditioningdominatesthetotaltimeofPCGsolver,whichismostlyduetothematrix-vectormultiplicationinthepre-andpostsmoothinginv-cycle,anditmakestheGIAMGmuchmoreexpensivethanaregularDCGsolverforonePCGiteration.However,ontheotherhand,GIAMGhasamuchfasterconvergence.ThismotivatesustocomparethetotalsolvetimeofGIAMGwitharegularDCGsolver.WecomparetheGIAMGsolverwiththewell-optimizedDCGsolverimplementedinNektar++forthesolvetimeasshowninFigure7.Firstofall,fromFigure6(right)andFigure7(right),thetimeofmatrix-vectormultiplicationlookssimilarinthetwomethods,whichindicatesitisareasonablecomparisonbetweenNektar++andourGIAMGsolverforthesolvetimeeventhoughtheyaretwodi↵erentimplementations.Consideringthedominantcostofv-cycle,ourGIAMGisabout7timesasgreatasDCGintermsofthesolvetimeinonePCGiteration.However,sinceDCGsolverisabout(3509iterations)40timesasgreatasourGIAMGsolverintermsofthenumberofiterationsforconvergence,wearestillabout6timesasfastasthe1602040608010012010 610 410 2100NumberofiterationsRelativeresidualCoarsenby2Coarsenby1Coarsenby1Coarsenby200.10.20.30.4Time(s)FirstsmoothSmoothResidualv-cycleFigure9:Comparisonofdi↵erentp-coarseningstrategiesatp=5fortheNavier–Stokesoperator.(Left):convergenceresults.(Right)proﬁling.1632641280.040.060.090.14NumberofnodesTime(s)v-cycleFigure10:Strongscalingofv-cycleofGIAMGupto128nodesfortheNavier–Stokesoperator.theAMGsolversseemtohaveacomparableconvergencerate.ThecomparisonofoverallsolvetimeisshowninFigure11(right).Again,ourGIAMGsolverperformsthebestamongallthemultigridmethods.4.ConclusionsandfutureworkInthispaper,weproposedageometricallyinformedalgebraicmultigrid(GIAMG)methodforsolvinghigh-orderﬁniteelementsystems.Wepresentedexperimentsfortwo3Doperators–HelmholtzandincompressibleNavier–Stokes.FortheHelmheltzoperator,weshowedtheconver-genceresultsfromp=3uptop=10.Theconvergenceratedoesnotsigniﬁcantlychangeastheporderisincreasing.Wealsocomparedtheconvergenceandsolvetimewithsomewell-recognizedAMGpackagesandastandarddiagonallypreconditionedPCGsolverimplementedinNektar++.OurGIAMGperformedthebestamongallthesolvers.FortheincompressibleNavier–Stokesop-20,8,Conclusions,This,report,has,brieﬂy,outlined,the,approaches,being,undertaken,within,Nektar++,and,NEP-,TUNE,for,the,solution,of,elliptic,problems.,The,core,performance,of,the,solver,itself,demon-,strates,high,performance,for,both,on-node,and,parallel,execution.,Some,of,the,challenges,facing,further,development,have,been,outlined,,with,a,particular,emphasis,on,the,development,of,multigrid,methods.,The,further,development,of,these,techniques,,and,particularly,their,use,on,diﬀerent,architectures,such,as,GPUs,,is,also,a,key,focus,of,the,development,team,which,we,discuss,in,other,NEPTUNE,deliverables.,References,[1],George,Karniadakis,and,Spencer,Sherwin.,Spectral/Hp,Element,Methods,for,Computational,Fluid,Dynamics.,Oxford,University,Press,,2013.,[2],Jing,Gong,,Stefano,Markidis,,Erwin,Laure,,Matthew,Otten,,Paul,Fischer,,and,Misun,Min.,Nekbone,performance,on,GPUs,with,OpenACC,and,CUDA,Fortran,implementations.,The,Journal,of,Supercomputing,,72(11):4160–4180,,2016.,[3],David,Moxey,,Roman,Amici,,and,Mike,Kirby.,Eﬃcient,Matrix-Free,High-Order,Finite,Ele-,ment,Evaluation,for,Simplicial,Elements.,SIAM,J.,Sci.,Comput.,,42(3):C97–C123,,January,2020.,[4],Nicolas,Oﬀermans,,Adam,Peplinski,,Oana,Marin,,Elia,Merzari,,and,Philipp,Schlatter.,Per-,formance,of,Preconditioners,for,Large-Scale,Simulations,Using,Nek5000.,In,Spectral,and,High,Order,Methods,for,Partial,Diﬀerential,Equations,ICOSAHOM,2018,,pages,263–272.,Springer,,Cham,,2020.,15

:pdfembed:`src:_static/TN-12-2_EllipticSolversWithinNektar.pdf, height:1600, width:1100, align:middle`