TN-03_TimeSteppingTechniquesPreconditioningHyperbolicAnisotropicEllipticProblems ================================================================================ .. meta:: :description: technical note :keywords: Time,Stepping,Techniques,and,Preconditioning,for,Hyperbolic,and,Anisotropic,Elliptic,Problems:,Numerical,Results,Technical,Report,2060049-TN-03,Hussam,Al,Daas*,Niall,Bootland*,Tyrone,Rees*,Sue,Thorne†,September,2022,1,Introduction,Exascale,targeted,plasma,modelling,will,require,the,efficient,and,accurate,solution,of,systems,of,hyperbolic,partial,differential,equations,(PDEs),,and,corresponding,elliptic,problems,,in,the,presence,of,highly,anisotropic,dynamics.,The,techniques,used,to,solve,such,problems,must,scale,with,the,computing,power,available,and,be,robust,enough,to,be,portable,to,any,emerging,hardware.,Due,to,their,high,arithmetic,intensity,,high,order,finite,elements/spectral,element,codes,generally,employ,matrix-free,techniques,and,are,one,of,the,few,methods,that,allow,the,data,movement,to,be,limited,to,a,level,that,would,be,amenable,to,exascale,computing.,For,that,reason,,in,this,report,we,focus,on,linear,systems,arising,from,such,discretisations.,It,is,important,that,these,systems,are,solved,matrix-free,,as,the,matrices,arising,from,high,order,finite,elements,are,significantly,denser,than,their,low,order,counterparts;,this,limits,the,choices,of,preconditioner,available.,Furthermore,,preconditioners,that,are,extremely,successful,for,low,order,finite,elements,usually,do,not,give,satisfactory,performance,out-of-the-box,as,the,polynomial,degree,increases.,In,this,report,,we,take,the,results,of,our,literature,and,software,review,[13],and,examine,the,methods,of,interest,on,a,range,of,test,problems,that,have,been,designed,to,incorporate,the,challenges,that,will,be,present,when,simulating,plasma,within,a,tokamak.,We,first,describe,,in,Section,2,,the,software,test,platform,we,use,for,our,computations.,In,Section,3,,we,compare,preconditioning,strategies,for,highly,anisotropic,elliptic,problems,,where,anisotropy,is,introduced,within,the,problem,definition,,the,underlying,mesh,and,a,combination,of,both.,We,then,move,to,consider,stationary,hyperbolic,problems,in,Section,4,,using,our,findings,when,looking,at,high,order,discretisations,of,elliptic,problems,to,inform,the,choice,of,preconditioners,we,test.,Time-dependant,hyperbolic,problems,are,covered,in,Section,5,where,we,investigate,various,implicit,time-stepping,schemes,and,how,they,perform,on,several,different,examples,,along,with,preconditioning,for,these,problems.,Finally,,we,draw,conclusions,in,Section,6.,2,Test,platform,The,majority,of,our,tests,were,run,on,STFC,Hartree,Centre’s,Scafell,Pike,,which,is,a,Bull,Sequana,X1000,su-,percomputer.,We,ran,our,test,problems,on,Scafell,Pike’s,‘Regular’,compute,nodes,,composed,of,2x,Intel,Xeon,Gold,E5-6142,v5,(also,known,as,Skylake),16,core,2.6GHz,(up,to,3.7GHz).,For,these,nodes,,the,system,has,a,total,of,192,GB,of,RAM,across,the,27,000,cores.,The,system,uses,a,Mellanox,EDR,Infiniband,interconnect.,Scafell,Pike,uses,IBM’s,LSF,scheduler:,we,requested,exclusive,access,to,the,nodes,being,used,for,our,test,runs.,In,our,results,,we,provide,“wall,clock”,times,in,seconds.,The,majority,of,our,test,problems,were,run,using,MFEM,with,PETSC.,In,Table,1,,we,provide,the,software,versions,and,compilation,details.,*Scientific,Computing,Department,,STFC,Rutherford,Appleton,Laboratory,,Harwell,Campus,,Didcot,,OX11,0QX,,UK.,†Hartree,Centre,,STFC,Rutherford,Appleton,Laboratory,,Harwell,Campus,,Didcot,,OX11,0QX,,UK.,Email,contact:,sue.thorne@stfc.,ac.uk,1,Software,MFEM,PETSc,Intel,MPI,Version,4.4,3.17,2022,Compiler,intel,2022,intel,2022,intel,2022,Flags,-O3,-fp-model,fast,-mavx2,-O3,-fp-model,fast,-mavx2,default,Table,1:,Software,version,,compiler,and,flags,used,for,MFEM-based,numerical,examples.,3,Highly,anisotropic,elliptic,problems,We,first,consider,solving,elliptic,problems,,which,are,PDEs,without,real,characteristic,surfaces,,and,thus,have,an,infinite,speed,of,propagation.,As,such,,these,problems,typically,model,steady-state,phenomena,,which,may,correspond,to,long-time,behaviour,after,transients,have,dissipated,,or,equilibrium,solutions,of,hyperbolic,problems.,Further,,elliptic,equations,tend,to,have,smooth,solutions,and,so,are,ideally,suited,to,the,use,of,high,order,finite,element/spectral,element,methods,(FEM/SEM),for,their,approximation.,The,numerical,solution,of,such,problems,requires,the,solution,of,a,large,linear,system,of,equations,,Ax,=,b,,(1),where,A,is,an,n×n,matrix,,and,x,and,b,are,n-vectors.,Naive,methods,for,solving,such,systems,take,O,(n3),opera-,tions,and,,hence,,are,infeasible,for,large,problems.,A,large,proportion,of,the,work,in,a,finite,element,simulation,is,in,solving,this,linear,system,,and,so,it,is,vital,that,a,robust,and,scalable,method,is,chosen;,identifying,such,a,method,is,the,target,of,this,report.,Typically,,for,such,large,linear,systems,,much,of,this,work,is,in,finding,an,effective,preconditioner,[36].,The,presence,of,anisotropy,,where,properties,depend,upon,the,direction,in,which,they,are,being,measured,,in,the,equations,considered,in,fusion,is,an,additional,challenge.,We,will,investigate,anisotropy,in,two,guises:,1.,Anisotropy,intrinsic,to,the,model,through,physical,parameters,in,the,problem.,Namely,,through,the,diffusion,coefficient,,which,can,vary,highly,in,different,directions.,2.,Anisotropy,stemming,from,how,we,discretise,the,model.,In,particular,,we,consider,that,the,meshes,upon,which,we,approximate,the,solution,can,be,highly,stretched,in,one,direction,compared,to,another.,While,such,meshes,may,be,inadvisable,from,an,approximation,theory,perspective,,in,practice,the,constraints,imposed,by,geometric,and,physical,considerations,along,with,ease,of,use,mean,such,aspect,ratios,can,arise,in,challenging,applications,such,as,fusion,modelling.,Anisotropic,problems,can,cause,particular,challenges,for,numerical,solvers,,where,the,scales,involved,can,vary,by,many,orders,of,magnitude,within,a,single,problem,and,result,in,a,lack,of,robustness,for,many,standard,solution,techniques.,We,will,see,this,later,in,our,numerical,results.,In,our,study,,we,will,consider,both,2D,and,3D,highly,anisotropic,elliptic,problems.,For,spatial,dimension,d,=,2,or,3,,let,Ω,=,(0,,1)d,and,Γ,be,the,boundary,of,Ω.,We,consider,the,model,problem,−∇,·,α(x)∇u,=,f,,,u,=,0,,x,∈,Ω,,x,∈,Γ.,(2a),(2b),This,Poisson,problem,represents,diffusion,within,a,box.,Following,Xu,and,Zikatanov,[37],,we,take,the,diffusion,coefficient,α,to,be,a,diagonal,matrix,with,differing,scales,depending,on,directionality.,For,example,,for,d,=,2,we,choose,α(x),=,(cid:183)1,0,(cid:184),0,ϵ,while,,for,d,=,3,,we,choose,α(x),=,,,1,0,0,0,ϵ,0,,,0,0,1,where,ϵ,>,0,is,a,parameter,which,can,be,made,increasingly,small.,2,(3),(4),Figure,1:,3D,Kershaw,mesh,with,ϵy,=,ϵz,=,1,(left),and,ϵy,=,ϵz,=,0.3,(right).,Following,the,suggestion,of,Farrell,,we,also,solve,this,problem,on,anisotropic,Kershaw,meshes,[24,,25].,In,Figure,1,,we,show,a,3D,Kershaw,mesh,for,ϵy,=,ϵz,=,1,and,ϵy,=,ϵz,=,0.3,,where,decreasing,the,values,of,ϵy,and,ϵz,from,1,will,introduce,anisotropy,in,the,y,and,z,dimension,meshes,,respectively.,The,case,ϵy,=,ϵz,=,0.3,is,considered,“hard”,in,[25].,We,remark,that,the,primary,challenges,stem,,firstly,,from,the,high,order,discretisation,and,,secondly,,the,anisotropy,present.,In,considering,preconditioning,,we,focus,on,approaches,to,tackle,the,use,of,high-order,FEM,and,explore,how,they,perform,in,the,presence,of,anisotropy.,Typically,,a,solver,will,first,seek,to,ameliorate,the,intrinsic,difficulties,of,the,high,order,discretisation,,reducing,it,to,a,lower,order,problem,,and,then,apply,a,more,well-established,approach,to,treat,this,problem.,We,will,consider,two,types,of,such,high,order,solvers,along,with,the,option,of,tackling,the,high,order,system,directly,with,standard,low,order,solvers.,All,of,the,solvers,we,consider,below,will,be,applied,as,preconditioners,for,a,Krylov,subspace,method.,For,symmetric,positive,definite,(SPD),matrices,with,an,SPD,preconditioner,,the,well-known,Conjugate,Gradient,(CG),algorithm,[20],is,the,method,of,choice,,due,to,its,efficient,application,via,a,three-term,recurrence,,and,well-understood,convergence,properties.,For,non-symmetric,linear,systems,or,SPD,linear,systems,with,a,non-,SPD,preconditioner,,we,use,the,Generalised,Minimal,Residual,(GMRES),method,[31].,Other,Krylov,methods,are,available;,see,,e.g.,,Greenbaum,[17],for,an,introduction.,3.1,Preconditioning,for,elliptic,problems,We,first,give,a,brief,overview,of,the,methods,that,we,will,compare,in,this,section.,For,a,more,detailed,compar-,ison,of,the,state,of,the,art,,including,alternative,methods,,see,the,earlier,survey,reports,[3,,5].,3.1.1,(Algebraic),Multigrid,Multigrid,methods,[10],are,often,amongst,the,most,favourable,methods,used,to,solve,elliptic,problems,,and,can,be,highly,efficient,for,solving,large-scale,problems.,As,the,name,suggests,,multigrid,methods,are,based,around,a,hierarchy,of,multiple,grids.,Such,methods,exploit,the,fact,that,the,high-frequency,components,of,the,error,are,damped,effectively,after,only,a,few,applications,of,a,local,relaxation,method,(for,instance,,classical,iterative,methods,such,as,Jacobi,or,Gauss-Seidel),,while,low-frequency,components,are,relatively,smooth,and,so,can,be,well-approximated,on,a,coarser,grid.,The,local,relaxation,method,is,often,termed,the,“smoother”,for,this,reason.,By,projecting,onto,a,coarser,grid,,some,of,the,original,low-frequency,components,become,high,frequency,on,this,grid,and,so,can,be,treated,by,the,smoother,with,low-frequency,components,then,able,to,be,transferred,to,a,yet,coarser,mesh.,Recursion,on,a,hierarchy,of,grids,yields,a,multilevel,iterative,approach.,Since,these,grids,are,a,geometric,construction,,typically,given,by,a,nested,set,of,grids,where,one,is,a,refinement,of,the,next,,such,an,approach,is,called,geometric,multigrid,(GMG).,3,Where,available,,geometric,multigrid,methods,can,be,very,powerful,tools.,However,,they,rely,on,a,hier-,archy,of,grids,being,given,or,easily,constructed.,In,practice,,defining,and,realising,such,a,set,of,grids,may,be,difficult,or,impractical,,especially,in,complex,domains,where,meshing,in,itself,is,challenging.,To,this,end,,alge-,braic,multigrid,(AMG),methods,[30,,37],have,been,developed,which,aim,to,provide,a,more,black-box,solution,method,,avoiding,the,requirement,for,a,predefined,grid,structure.,These,methods,have,been,generalised,in,many,ways,using,heuristic,arguments,aimed,at,improving,efficiency,and,robustness.,Modern,software,imple-,mentations,of,AMG,are,complex,but,have,been,shown,to,provide,efficient,black-box,solvers,for,certain,types,of,problems,,especially,those,stemming,from,second,or,fourth-order,elliptic,PDEs.,One,of,the,most,widely,used,sets,of,AMG,solvers,is,available,through,software,library,HYPRE,,developed,at,Lawrence,Livermore,National,Laboratory,,and,in,particular,for,our,tests,we,will,investigate,BoomerAMG,[19].,Note,that,this,tackles,the,high,order,system,directly,,without,aiming,to,reduce,it,to,a,lower,order,problem,first.,3.1.2,p-multigrid,The,above,discussion,on,multigrid,details,the,classical,approach,,which,can,be,thought,of,as,reducing,the,number,of,DOFs,on,each,grid,by,increasing,the,mesh,spacing,h;,thus,these,are,h-multigrid,methods.,For,higher,order,FEM,,another,possibility,for,reducing,DOFs,is,to,coarsen,by,reducing,the,polynomial,degree,,rather,than,the,size,of,the,mesh:,this,is,known,as,p-multigrid.,Note,that,a,clear,nested,set,of,discretisations,is,given,by,simply,reducing,the,polynomial,degree,on,the,same,mesh,,alleviating,the,geometric,difficulties,arising,with,h-,multigrid.,The,familiar,multigrid,recipe,can,be,followed,by,smoothing,,coarsening,in,p,,and,repeating,typically,until,the,lowest,order,discretisation,(here,p,=,1),which,can,be,solved,via,standard,approaches,,such,as,AMG,,before,propagating,back,up,to,the,original,problem.,We,make,use,of,the,p-multigrid,method,available,in,the,MFEM,implementation,of,CEED,BPS3,[25],,which,uses,Chebyshev,smoothing,,and,coarsen,in,powers,of,2,from,the,problem,order,p,to,the,lowest,order,(p,=,1),dis-,cretisation,(for,instance,,using,p,=,8,,4,,2,and,1).,To,solve,the,coarse,(low-order),problem,we,use,BoomerAMG.,Note,that,,since,we,make,use,of,h-multigrid,on,the,coarse,problem,,this,constitutes,a,hp-multigrid,method,,though,more,broadly,such,methods,can,interleave,coarsening,in,h,with,coarsening,in,p.,Recently,,a,new,hp-multigrid,method,was,proposed,in,[11],based,on,vertex-star,relaxation,which,was,proved,to,be,robust,for,high,order,FEM,for,the,isotropic,problem,on,regular,Cartesian,meshes.,Note,that,for,this,method,it,is,sufficient,to,coarsen,straight,from,the,high,order,problem,to,the,lowest,p,=,1,problem.,This,approach,is,implemented,in,the,software,Firedrake,and,is,not,yet,available,in,MFEM.,3.1.3,LOR,preconditioning,One,of,the,primary,difficulties,with,using,high-order,finite,elements,is,that,the,system,matrix,has,increased,coupling,between,degrees,of,freedom,as,p,increases,and,,therefore,,there,is,an,increasing,number,of,non-zero,entries.,The,number,of,couplings,,that,is,the,number,of,non-zero,entries,per,row,,increases,as,O,(p,d,),and,so,the,memory,requirement,for,the,storage,of,the,system,matrix,,if,assembled,,increases,as,O,(p,2d,).,This,motivates,matrix-free,approaches,,where,the,use,of,sum,factorisation,techniques,on,tensor,product,elements,can,reduce,the,computational,complexity,down,to,O,(d,p,d,+1);,see,[27].,However,,in,terms,of,preconditioning,,this,prohibits,methods,which,require,the,matrix,to,be,built,so,as,to,access,individual,entries,,such,as,AMG,methods,[30,,37].,One,way,to,mitigate,these,issues,is,to,base,the,preconditioner,on,a,low-order,discretisation,of,the,problem,,typically,p,=,1.,This,will,yield,a,much,sparser,matrix,which,has,a,fixed,cost,to,build,,independent,of,p,,thus,al-,lowing,access,to,the,wealth,of,experience,in,preconditioning,low-order,methods.,The,low-order,refined,(LOR),preconditioning,strategy,exploits,this,idea,by,discretising,on,a,refined,mesh,,namely,given,by,a,structured,grid,of,Gauss–Lobatto,points,within,each,high-order,element.,Note,that,these,points,are,clustered,towards,the,boundary,of,the,element,and,so,yield,an,anisotropic,refined,mesh,which,is,not,shape-regular,with,respect,to,p;,see,Figure,2.,The,purpose,of,this,particular,refined,mesh,is,to,ensure,the,spectral,equivalence,between,the,dif-,ferent,discretisations,,known,as,finite,element,method–spectral,element,method,(FEM–SEM),equivalence,[12].,In,essence,,this,says,that,a,good,preconditioner,for,the,LOR,discretisation,will,also,be,a,good,preconditioner,for,the,high-order,discretisation.,While,many,choices,could,be,made,to,precondition,the,LOR,discretisation,of,the,problem,,see,[28],and,references,therein,,we,stick,to,algebraic,approaches,which,can,be,used,out-of-the-box.,In,particular,,we,will,apply,BoomerAMG,to,the,LOR,system,and,use,this,precondition,the,high-order,problem.,4,Figure,2:,An,example,of,a,low-order,refined,(LOR),mesh,on,a,quadrilateral,element,for,p,=,10.,This,uses,11,Gauss–Lobatto,points,in,each,dimension.,3.1.4,Overlapping,Domain,Decomposition,Methods,Similar,to,multigrid,methods,,overlapping,domain,decomposition,methods,(ODDM),[14],are,widely,used,for,solving,linear,systems,arising,from,the,discretisation,of,PDEs.,Although,the,method,was,originally,designed,as,a,solver,on,its,own,,it,is,most,frequently,used,as,a,preconditioner,for,an,appropriate,Krylov,method.,As,the,name,suggests,,the,domain,in,which,the,PDE,is,solved,is,decomposed,into,multiple,overlapping,subdomains,,and,in,each,of,these,subdomains,the,same,PDE,is,defined,,possibly,with,different,boundary,conditions.,One,iteration,of,the,ODDM,method,solves,the,local,PDEs,in,different,subdomains,concurrently,and,then,glues,the,update,solution,by,using,a,partition,of,unity,that,handles,the,overlapping,regions.,The,method,can,be,defined,in,an,algebraic,way,,only,requiring,a,sparse,matrix,,by,partitioning,the,nodes,in,the,sparsity,graph,of,the,matrix.,The,concurrent,solution,of,local,problems,makes,ODDM,very,attractive,for,parallel,computing.,However,,increasing,the,number,of,subdomains,yields,a,higher,number,of,iterations,to,reach,convergence.,Spectral,coarse,space,(or,second,level),techniques,are,therefore,used,as,a,remedy.,The,idea,behind,a,spectral,coarse,space,is,to,judiciously,define,an,artificial,subdomain,that,encapsulates,the,information,required,to,ensure,fast,convergence.,Recent,advances,in,ODDM,allow,the,construction,of,coarse,spaces,to,be,done,efficiently,and,algebraically,[14,,2].,A,generalised,eigenvalue,problem,is,solved,in,each,subdomain,concurrently,to,compute,local,requirements,for,an,effective,coarse,space.,For,symmetric,positive,definite,(SPD),matrices,(e.g.,,those,arising,from,elliptic,PDEs),,the,coarse,space,can,be,constructed,such,that,the,condition,number,of,the,pre-,conditioned,matrix,is,upper,bounded,by,any,number,specified,by,the,user.,Since,the,convergence,of,Krylov,methods,for,SPD,matrices,is,dependent,on,the,condition,number,of,the,preconditioned,matrix,,this,allows,for,well,understood,convergence,behaviour.,It,is,worth,mentioning,that,ODDM,with,spectral,coarse,space,can,be,seen,as,a,two-grid,method,where,the,smoothing,is,the,local,solve,in,the,subdomain.,However,,the,key,difference,with,respect,to,multigrid,methods,is,the,coarsening,strategy.,The,coarsening,factor,(the,ratio,between,two,consecutive,levels),in,multigrid,methods,is,very,small,compared,to,that,in,ODDM.,Recently,,an,algebraic,recursive,approach,of,effective,spectral,coarse,spaces,was,proposed,to,produce,an,efficient,multilevel,ODDM,[1].,The,software,package,HPDDM,[22],implements,all,up-to-date,spectral,multilevel,ODDM.,Although,HPDDM,optimises,many,computational,kernels,for,multilevel,ODDM,,there,is,still,a,large,room,for,improvement—more,details,are,discussed,in,Section,3.2.7.,Therefore,,we,will,only,present,a,synthetic,strong,scalability,comparison,against,an,algebraic,multigrid,preconditioner,to,illustrate,the,potential,of,ODDM,when,employed,on,exascale,architectures.,3.1.5,Sparse,approximate,inverse,preconditioning,The,sparse,approximate,inverse,(SPAI),algorithm,[18],aims,to,provide,an,approximation,to,the,inverse,of,the,system,matrix,,then,used,as,a,preconditioner,,under,the,condition,that,the,approximation,should,be,sparse.,This,is,done,by,minimising,the,error,between,the,preconditioned,matrix,and,the,identity,in,the,Frobenius,norm.,This,choice,of,norm,allows,for,some,inherent,parallelisation,since,columns,(or,rows),of,the,approximate,5,inverse,can,be,calculated,independently,,each,column,then,requiring,the,solution,of,a,least-squares,problem.,Much,of,the,effort,is,then,in,determining,,adaptively,,what,sparsity,pattern,each,column,should,possess,in,order,to,provide,a,good,approximate,inverse,without,including,too,many,non-zero,entries.,To,test,this,approach,we,utilise,the,version,of,SPAI,available,in,PETSc,[9],,which,interfaces,to,code,writ-,ten,by,Stephen,Barnard.,As,well,as,testing,this,method,in,its,own,right,,we,use,this,as,a,proxy,for,how,well,the,Markov,Chain,Monte,Carlo,Matrix,Inversion,(MCMCMI),method,[6],would,perform,on,such,problems.,MCMCMI,computes,a,sparse,approximate,inverse,by,performing,a,random,walk,on,a,graph.,While,it,may,be,expected,that,the,set,up,cost,of,MCMCMI,could,be,more,efficient,than,SPAI,on,large-scale,hardware,[4],,the,behaviour,as,h,scales,can,be,expected,to,be,broadly,similar.,Since,MCMCMI,cannot,be,applied,matrix-free,and,cannot,be,interfaced,directly,to,a,finite,element,package,at,this,time,,this,test,informed,us,whether,this,would,be,a,worthwhile,use,of,developer,effort.,3.2,Numerical,results,for,the,elliptic,problem,In,Section3.2.1,we,present,some,initial,results,that,were,obtained,using,Firedrake1,We,use,the,results,of,these,tests,to,inform,the,methods,implemented,in,MFEM,,and,run,at,a,larger,scale,,both,in,terms,of,the,problem,size,and,the,number,of,processors,,on,Scafell,Pike.,In,all,tests,,unless,otherwise,stated,,we,use,the,preconditioned,−8,and,a,maximum,number,of,500,iterations;,conjugate,gradient,method,with,a,relative,residual,tolerance,of,10,if,this,is,reached,before,convergence,we,denote,this,in,tables,by,“×”.,Should,a,particular,test,fail,,we,indicate,this,by,marking,“−”,in,the,relevant,table,entry.,3.2.1,2D,results,using,Firedrake,In,this,section,,we,provide,results,in,Firedrake.,These,results,are,obtained,on,a,desktop,computer,using,8,processors.,We,solve,(2),with,the,right-hand,side,f,=,1.,Firstly,,in,Table,2,we,detail,iteration,counts,for,BoomerAMG,[19],and,the,hp-multigrid,method,[11],for,1,050,625,degrees,of,freedom,(DOFs).,We,see,that,the,hp-multigrid,method,is,robust,in,the,isotropic,case,,but,robustness,is,lost,when,introducing,anisotropy,in,either,the,diffusion,coefficient,or,the,mesh.,On,the,basis,of,this,experience,,we,conclude,this,approach,will,struggle,when,applied,to,highly,anisotropic,problems.,BoomerAMG,performs,a,little,better,in,most,cases,,although,still,has,issues,when,applied,to,high,degrees,of,anisotropy.,ϵ,1,−3,10,−6,10,p,1,2,4,8,1,2,4,8,1,2,4,8,1,6,7,9,8,44,38,36,9,×,373,257,6,ϵy,0.7,8,10,11,14,47,52,85,124,197,195,190,267,0.3,19,25,32,70,56,58,134,242,155,144,189,334,ϵ,1,−3,10,−6,10,p,1,2,4,8,1,2,4,8,1,2,4,8,1,6,6,6,6,140,124,114,89,×,×,478,253,ϵy,0.7,18,18,17,16,168,134,130,115,×,×,×,400,0.3,97,97,94,77,202,182,222,206,×,×,×,399,(a),AMG,(b),hp-multigrid,Table,2:,Iteration,counts,using,BoomerAMG,or,hp-multigrid,in,Firedrake,for,the,2D,anisotropic,elliptic,prob-,lem,with,Kershaw,meshes,while,varying,the,diffusion,parameter,ϵ,,mesh,parameter,ϵy,and,FEM,order,p.,Here,the,mesh,refinement,is,changed,with,p,so,that,each,problem,instance,has,the,same,number,of,DOFs,,namely,1,050,625.,The,computations,use,8,processors.,Note,that,the,problem,size,is,smaller,than,in,the,subsequent,MFEM,tests.,1Thanks,to,Patrick,Farrell,and,Pablo,Brubeck,for,providing,the,initial,code,on,which,these,tests,were,based.,6,When,we,consider,the,algebraic,SPAI,preconditioner,,we,found,that,for,large,problems,this,approach,con-,sistently,fails,to,converge,,and,so,can,only,give,results,for,significantly,smaller,problems.,Further,,as,there,is,no,guarantee,that,the,preconditioner,is,symmetric,positive,definite,(SPD),we,must,use,GMRES,(we,note,that,in,our,experience,CG,only,works,for,p,=,1).,Results,on,a,small,problem,with,just,4,225,DOFs,are,given,in,Table,3a,in,order,to,assess,robustness,to,anisotropy.,We,find,that,iteration,counts,are,large,,even,for,this,small,problem,,and,increase,slightly,as,we,decrease,ϵy,but,can,decrease,as,we,decrease,ϵ.,Nonetheless,,the,approach,gener-,ally,deteriorates,when,anisotropy,is,present,in,both,the,diffusion,coefficient,and,mesh.,We,note,that,iteration,counts,are,broadly,stable,with,p,in,this,regime,,increasing,only,mildly,in,most,cases.,However,,the,primary,difficulty,with,this,method,is,that,is,not,robust,as,we,increase,the,number,of,DOFs,,by,mesh,refinement,or,increasing,p,for,a,fixed,mesh.,This,is,observed,in,the,results,of,Table,3b,where,we,see,a,large,growth,in,iteration,counts,whenever,we,decrease,h,or,increase,p.,We,highlight,,too,,that,this,behaviour,aligns,with,that,reported,when,the,MCMCMI,was,applied,to,realistic,problems,in,a,recent,study,[32].,Overall,,the,SPAI,preconditioner,lacks,the,robustness,required,to,be,considered,as,a,solver,at,exascale,for,problems,of,this,type.,ϵ,1,−3,10,−6,10,p,1,2,4,8,1,2,4,8,1,2,4,8,1,279,389,237,324,155,110,129,132,189,186,200,163,ϵy,0.7,348,357,365,355,358,359,351,376,367,356,330,396,0.3,296,379,426,486,474,476,×,×,480,470,×,×,p,1,2,4,8,1/8,9,40,74,324,1/16,20,101,237,×,h,1/32,55,389,×,×,1/64,279,×,×,×,1/128,×,×,×,×,(a),Results,for,the,2D,anisotropic,elliptic,problem,with,Kershaw,meshes,while,varying,the,diffusion,parameter,ϵ,,mesh,parameter,ϵy,and,FEM,order,p,for,a,fixed,number,of,DOFs,,namely,4,225.,(b),Results,for,the,2D,isotropic,elliptic,problem,with,regular,meshes,while,varying,the,mesh,spacing,h,and,FEM,order,p.,Note,that,the,number,of,DOFs,is,constant,along,diagonals,from,lower,left,to,upper,right.,Table,3:,Iteration,counts,using,SPAI,in,Firedrake,for,the,2D,elliptic,problem.,On,the,left,,we,vary,the,anisotropy,while,keeping,the,number,of,DOFs,fixed,at,4,225.,On,the,right,,we,vary,the,mesh,spacing,h,and,FEM,order,p.,The,computations,use,8,processors.,Note,that,problem,sizes,are,much,smaller,than,in,other,tables.,3.2.2,2D,anisotropic,elliptic,problem,with,regular,mesh,Since,high,order,FEM,offers,advantages,for,future,computing,directions,and,research,on,solver,technology,is,ongoing,,the,Poisson,problem,on,Kershaw,meshes,has,been,posed,as,a,benchmark,problem,by,the,Center,for,Efficient,Exascale,Discretizations,(CEED),based,in,the,USA.,In,particular,,in,[25],it,is,posed,as,a,bake-off,problem,for,solvers,,denoted,CEED,BPS3.,An,implementation,of,this,problem,within,the,software,MFEM,is,part,of,the,CEED,suite,https://github.com/CEED,,although,this,is,in,a,private,repository,at,the,time,of,writing.,We,have,adapted,this,implementation,for,our,anisotropic,test,problems,in,order,to,run,our,solver,tests.,The,problem,setup,in,BPS3,consists,of,that,given,in,(2),posed,on,straight-sided,Kershaw,meshes,with,a,polychromatic,right-,hand,side,f,with,a,known,solution,(in,particular,,we,use,the,default,structure,level,n,=,3;,see,[25],for,details);,this,is,the,setup,we,use,in,the,remainder,of,this,section.,In,this,section,,we,present,numerical,results,for,the,2D,elliptic,problem,on,a,regular,Cartesian,mesh,with,anisotropy,arising,from,the,diffusion,coefficient,α,,as,given,in,(3).,We,choose,the,mesh,spacing,so,that,it,varies,with,the,FEM,polynomial,order,p,such,that,each,problem,instance,has,the,same,fixed,number,of,degrees,of,freedom,(DOFs),,namely,67,125,249.,For,p,=,1,this,corresponds,to,an,equispaced,grid,with,213,+,1,=,8,193,points,in,each,dimension.,We,utilise,128,processors,in,each,case,giving,approximately,half,a,million,DOFs,per,processor.,In,Table,4,,we,provide,iteration,counts,,the,total,computational,run,time,(in,seconds),along,with,the,true,7,error,achieved,in,the,solution,by,comparing,it,to,the,exact,analytical,solution.,We,vary,the,anisotropy,through,the,diffusion,parameter,ϵ,in,(3),along,with,the,FEM,order,p.,We,first,note,that,,for,the,same,number,of,DOFs,,the,error,reduces,as,we,increase,p,and,this,reduction,can,be,several,orders,of,magnitude,compared,to,p,=,1.,Considering,AMG,on,the,full,system,,we,see,that,iteration,counts,are,low,in,the,isotropic,case,(ϵ,=,1),but,−6,,AMG,begins,to,really,struggle,with,degrade,as,the,anisotropy,increases.,In,the,highly,anisotropic,case,ϵ,=,10,large,iteration,counts,observed,for,all,but,the,case,p,=,8,and,we,note,that,the,error,in,the,solution,can,be,larger,than,when,using,the,LOR,approach.,The,p-multigrid,method,performs,even,better,and,is,the,most,efficient,solver,in,the,isotropic,case,with,iteration,counts,and,time,taken,which,are,independent,of,p.,However,,iteration,counts,become,much,worse,in,the,anisotropic,case,and,the,solver,fails,to,converge,within,500,iterations,when,−6.,These,results,suggest,an,overall,lack,of,robustness,for,the,AMG,and,p-multigrid,solvers,for,highly,ϵ,=,10,anisotropic,problems.,The,LOR,approach,proves,more,robust,with,relatively,low,iteration,counts,in,all,cases,and,we,further,note,that,the,total,run,time,remains,small,with,very,little,increase,as,we,increase,p,,unlike,AMG,on,the,full,system,where,larger,run,times,are,observed.,It,is,interesting,to,note,from,Table,4,that,LOR,is,far,more,robust,than,AMG,for,anisotropic,problems,of,this,type,,even,for,low,orders,of,p.,We,highlight,that,the,poor,performance,of,AMG,for,lower,orders,corresponds,with,the,results,obtained,in,Firedrake,,reported,in,Table,2a,,and,we,conclude,this,is,a,feature,of,the,solver,,not,of,the,implementation.,On,the,other,hand,,AMG,appears,to,be,fairly,robust,to,this,form,of,anisotropy,for,p,=,8,,which,is,intriguing.,ϵ,1,−3,10,−6,10,p,1,2,4,8,1,2,4,8,1,2,4,8,It.,9,8,13,12,39,40,40,11,463,463,397,15,AMG,Tot.,time,6.9,7.8,16.9,35.9,24.1,28.7,43.2,34.6,200.4,246.2,326.9,40.0,Error,5.7e-06,2.5e-07,3.3e-09,1.0e-09,4.8e-06,2.5e-07,3.2e-09,3.0e-10,4.5e-06,2.8e-07,1.5e-07,1.0e-09,It.,9,8,9,89,105,144,×,×,×,p-MG,Tot.,time,6.6,4.7,5.1,44.3,44.2,61.5,×,×,×,Error,2.5e-07,3.1e-09,3.3e-10,2.5e-07,3.4e-09,1.6e-09,×,×,×,It.,13,14,15,19,12,17,17,17,12,17,18,18,LOR,Tot.,time,10.1,9.4,12.3,14.3,11.3,11.8,12.4,12.4,11.4,11.8,12.7,12.6,Error,5.7e-06,2.5e-07,3.2e-09,6.6e-10,4.8e-06,2.5e-07,3.1e-09,3.5e-10,4.5e-06,2.5e-07,2.9e-09,2.7e-10,Table,4:,Iteration,counts,,total,time,of,computation,(in,seconds),,and,error,in,solution,for,the,2D,anisotropic,elliptic,problem,with,regular,meshes,while,varying,the,diffusion,parameter,ϵ,and,FEM,order,p.,Here,the,mesh,refinement,is,changed,with,p,so,that,each,problem,instance,has,the,same,number,of,DOFs,,namely,67,125,249.,The,computations,use,128,processors.,3.2.3,2D,elliptic,problem,with,Kershaw,meshes,We,turn,now,to,the,case,of,anisotropy,through,the,mesh.,In,this,section,,we,provide,results,for,the,isotropic,diffusion,problem,(ϵ,=,1),but,discretised,on,an,anisotropic,Kershaw,mesh,[24].,The,mesh,anisotropy,is,con-,trolled,by,the,Kershaw,parameter,ϵy,,,with,ϵy,=,1,being,isotropic,and,decreasing,ϵy,giving,increasingly,stretched,elements,and,thus,a,more,anisotropic,mesh.,We,run,tests,analogous,to,the,previous,section,,now,varying,ϵy,and,p,with,a,fixed,number,of,DOFs.,Results,are,given,in,Table,5,and,we,first,note,that,AMG,on,the,full,system,is,somewhat,more,robust,for,the,parameters,tested,than,in,the,previous,example,in,Table,4.,Similarly,,p-multigrid,now,converges,in,all,cases,,however,,there,is,still,a,large,increase,in,iteration,counts,and,thus,time,taken,as,the,anisotropy,increases.,Both,AMG,and,LOR,now,have,some,small,degradation,in,iteration,counts,as,p,increases.,While,AMG,on,the,full,system,generally,has,a,lower,iteration,count,over,LOR,,we,see,that,the,total,run,time,is,somewhat,larger,for,higher,p.,The,run,time,for,LOR,remains,low,but,is,now,seen,to,slowly,increase,as,p,increases.,We,also,note,that,anisotropy,in,the,mesh,appears,to,have,a,larger,impact,on,the,error,achieved,in,the,solution,for,this,problem,,nonetheless,,good,accuracy,is,still,achieved,for,higher-order,FEM.,8,ϵ,1,0.7,0.3,p,1,2,4,8,1,2,4,8,1,2,4,8,It.,9,8,13,12,10,14,15,18,19,27,37,32,AMG,Tot.,time,6.9,7.8,16.9,35.9,10.7,12.9,25.6,55.5,14.7,26.4,55.4,88.3,Error,5.7e-06,2.5e-07,3.3e-09,1.0e-09,1.3e-05,3.8e-07,8.5e-09,1.2e-09,1.9e-04,7.4e-06,6.5e-07,1.7e-07,It.,9,8,9,14,18,26,66,108,169,p-MG,Tot.,time,6.6,4.7,5.1,8.8,8.7,12.3,32.7,45.1,72.1,Error,2.5e-07,3.1e-09,3.3e-10,3.8e-07,8.5e-09,9.4e-10,7.4e-06,6.5e-07,1.7e-07,It.,13,14,15,19,13,16,19,24,16,23,32,43,LOR,Tot.,time,10.1,9.4,12.3,14.3,14.8,15.1,15.6,18.3,16.0,17.7,20.8,26.5,Error,5.7e-06,2.5e-07,3.2e-09,6.6e-10,1.3e-05,3.8e-07,8.6e-09,6.3e-10,1.9e-04,7.4e-06,6.5e-07,1.7e-07,Table,5:,Iteration,counts,,total,time,of,computation,(in,seconds),,and,error,in,solution,for,the,2D,isotropic,elliptic,problem,(ϵ,=,1),with,Kershaw,meshes,while,varying,the,mesh,parameter,ϵy,and,FEM,order,p.,Here,the,mesh,refinement,is,changed,with,p,so,that,each,problem,instance,has,the,same,number,of,DOFs,,namely,67,125,249.,The,computations,use,128,processors.,3.2.4,2D,anisotropic,elliptic,problem,with,Kershaw,mesh,We,now,give,results,for,both,AMG,on,the,full,system,and,LOR,when,anisotropy,is,present,in,both,the,diffusion,coefficient,and,the,mesh.,In,this,case,,the,anisotropy,combines,to,make,the,problem,particularly,challenging.,−6,and,ϵy,=,0.3,the,solution,error,is,relatively,large,due,to,the,poor,approximation,Indeed,,we,note,that,for,ϵ,=,10,properties,of,approximating,a,highly,anisotropic,problem,on,a,highly,anisotropic,mesh.,It,is,therefore,unsur-,prising,that,solvers,will,begin,to,struggle.,In,Table,6,,we,provide,our,results,and,first,note,that,neither,AMG,nor,LOR,are,robust,in,this,regime.,In,all,cases,,iteration,counts,and,timings,increase,with,p.,Nonetheless,,LOR,remains,the,fastest,due,to,the,lower,setup,costs.,ϵ,ϵy,0.7,−3,10,0.7,−6,10,0.3,−3,10,0.3,−6,10,p,1,2,4,8,1,2,4,8,1,2,4,8,1,2,4,8,It.,55,149,170,247,430,×,×,×,70,220,418,465,458,×,×,×,AMG,Tot.,time,30.2,106.2,216.6,557.2,182.4,×,×,×,36.7,146.8,468.0,1020.0,196.1,×,×,×,Error,1.4e-03,2.4e-05,2.4e-07,7.8e-08,2.7e-03,×,×,×,7.4e-03,5.5e-04,3.4e-05,3.7e-06,1.0e-02,×,×,×,It.,55,110,216,297,85,314,×,×,85,191,394,×,108,401,×,×,LOR,Tot.,time,30.0,47.9,97.2,131.1,41.4,121.3,×,×,41.5,77.9,167.5,×,50.5,154.1,×,×,Error,1.4e-03,2.4e-05,2.3e-07,2.5e-08,2.7e-03,1.9e-04,×,×,7.4e-03,5.5e-04,3.4e-05,×,1.0e-02,2.3e-03,×,×,Table,6:,Iteration,counts,,total,time,of,computation,(in,seconds),,and,error,in,solution,for,the,2D,anisotropic,elliptic,problem,with,Kershaw,meshes,while,varying,the,diffusion,parameter,ϵ,,mesh,parameter,ϵy,and,FEM,order,p.,Here,the,mesh,refinement,is,changed,with,p,so,that,each,problem,instance,has,the,same,number,of,DOFs,,namely,67,125,249.,The,computations,use,128,processors.,9,3.2.5,3D,anisotropic,elliptic,problem,with,regular,mesh,We,move,now,to,the,3D,anisotropic,problem,,where,the,diffusion,coefficient,is,given,by,(4).,We,present,similar,results,to,the,2D,case,,now,for,a,3D,problem,of,size,16,974,593,DOFs.,In,Table,7,,we,see,that,both,AMG,and,p-multigrid,lack,robustness,as,we,increase,the,anisotropy,in,the,diffusion,coefficient,while,LOR,remains,fairly,robust.,Further,,outside,of,the,isotropic,case,,LOR,gives,both,the,lowest,iteration,counts,and,the,fastest,total,time,to,solution.,We,also,note,that,for,some,tests,AMG,failed,to,build,a,preconditioner,and,,thus,,the,test,did,not,complete:,these,are,marked,“−”.,ϵ,1,−3,10,−6,10,p,1,2,4,8,1,2,4,8,1,2,4,8,It.,−,7,9,−,50,52,61,−,181,163,124,−,AMG,Tot.,time,−,17.6,97.8,−,37.4,56.7,120.4,−,66.1,124.3,194.6,−,Error,−,3.4e-02,5.2e-02,−,5.5e-03,4.1e-02,6.6e-02,−,5.5e-03,4.1e-02,6.8e-02,−,It.,8,7,11,94,97,120,134,151,181,p-MG,Tot.,time,42.5,5.6,3.6,34.0,21.9,24.6,46.1,34.3,36.4,Error,3.4e-02,5.2e-02,6.4e-02,4.1e-02,6.6e-02,8.5e-02,4.1e-02,6.8e-02,8.7e-02,It.,19,27,24,22,20,29,26,23,20,30,27,24,LOR,Tot.,time,28.4,27.7,29.7,35.7,10.3,9.5,11.1,10.9,10.2,9.4,10.6,10.4,Error,5.6e-03,3.4e-02,5.2e-02,6.4e-02,5.5e-03,4.1e-02,6.6e-02,8.5e-02,5.5e-03,4.1e-02,6.8e-02,8.7e-02,Table,7:,Iteration,counts,,total,time,of,computation,(in,seconds),,and,error,in,solution,for,the,3D,anisotropic,elliptic,problem,with,regular,meshes,while,varying,the,diffusion,parameter,ϵ,and,FEM,order,p.,Here,the,mesh,refinement,is,changed,with,p,so,that,each,problem,instance,has,the,same,number,of,DOFs,,namely,16,974,593.,The,computations,use,128,processors.,3.2.6,3D,elliptic,problem,with,Kershaw,meshes,In,Table,8,we,detail,results,on,3D,Kershaw,meshes.,Again,,we,reach,similar,conclusions,to,the,2D,case,,namely,that,no,method,is,fully,robust,but,the,most,promising,approach,is,LOR,which,takes,the,lowest,total,time,and,has,only,mildly,increasing,iteration,counts,as,the,anisotropy,in,the,mesh,increases.,As,such,,we,consider,LOR,to,the,method,of,choice,for,treating,high,order,discretisation,of,highly,anisotropic,elliptic,problems,and,will,focus,on,this,approach,in,subsequent,sections.,ϵy,=,ϵz,1.0,0.7,0.3,p,1,2,4,8,1,2,4,8,1,2,4,8,It.,−,7,9,−,9,11,17,−,17,28,41,−,AMG,Tot.,time,−,17.6,97.8,−,250.0,59.8,149.7,−,155.4,111.0,254.9,−,Error,−,3.4e-02,5.2e-02,−,1.2e-02,2.8e-02,4.6e-02,−,2.0e-02,4.2e-02,6.4e-02,−,It.,8,7,11,21,30,48,162,207,313,p-MG,Tot.,time,42.5,5.6,3.6,48.9,13.0,10.9,79.5,48.8,63.2,Error,3.4e-02,5.2e-02,6.4e-02,2.8e-02,4.6e-02,4.5e-02,4.9e-02,1.3e-01,1.7e-01,It.,19,27,24,22,21,29,28,32,22,31,40,56,LOR,Tot.,time,28.4,27.7,29.7,35.7,61.3,63.8,47.0,47.7,48.2,50.0,46.6,53.0,Error,5.6e-03,3.4e-02,5.2e-02,6.4e-02,1.2e-02,2.8e-02,4.6e-02,4.5e-02,2.0e-02,4.2e-02,6.4e-02,6.4e-02,Table,8:,Iteration,counts,,total,time,of,computation,(in,seconds),,and,error,in,solution,for,the,3D,isotropic,elliptic,problem,(ϵ,=,1),with,Kershaw,meshes,while,varying,the,mesh,parameter,ϵy,=,ϵz,and,FEM,order,p.,Here,the,mesh,refinement,is,changed,with,p,so,that,each,problem,instance,has,the,same,number,of,DOFs,,namely,16,974,593.,The,computations,use,128,processors.,10,3.2.7,Inner,solver:,Multilevel,Domain,Decomposition,or,AMG,In,the,previous,sections,,we,have,seen,that,LOR,is,the,most,robust,method,in,2D,and,3D,for,dealing,with,anisotropy,in,elliptic,equations,discretised,by,high-order,finite,elements.,We,now,explore,alternative,methods,to,solve,the,low,order,system,in,the,FEM–SEM,equivalence.,In,particular,,we,present,a,strong,scaling,com-,parison,between,a,spectral,multilevel,DD,preconditioner,(HPDDM),and,an,algebraic,multigrid,preconditioner,(BoomerAMG),as,the,low-order,solver,for,an,LOR,preconditioner.,The,test,case,is,the,2D,elliptic,problem,with,Kershaw,mesh,discretised,on,a,grid,of,size,256,×,256,using,8th,order,FEM,resulting,in,a,system,of,size,≈,4,×,106.,The,Kershaw,parameter,is,ϵy,=,0.7.,The,test,case,is,chosen,to,assess,the,strong,scalability,of,the,setup,and,solve,phases,when,each,processor,owns,a,small,number,of,DOFs,without,the,necessity,for,employing,a,very,large,number,of,processors.,The,DD,preconditioner,constructs,two,grids,on,32–512,processors,and,three,grids,on,1024,processors.,32,AMG,51,68,DD,64,51,61,128,50,54,256,52,47,512,51,50,1024,52,54,Table,9:,Iteration,count,for,AMG,and,multilevel,DD.,Table,9,presents,the,iteration,count,required,to,reach,a,relative,residual,norm,10,−8,when,using,each,pre-,conditioner.,These,results,show,that,both,preconditioners,are,robust,with,respect,to,the,number,of,processors,on,this,test,problem.,AMG,DD,AMG,DD,lev,1,2,3,4,5,6,7,8,9,DOFs,4198401,574905,162123,39601,8784,1581,231,30,4,nnz,36923049,5265911,3136361,1038607,274304,50063,6361,562,16,DOFs,4198401,5120,-,-,-,-,-,-,-,nnz,36923049,750400,-,-,-,-,-,-,-,lev,1,2,3,4,5,6,7,8,9,DOFs,4198401,569530,159448,38404,7552,1198,174,22,4,nnz,36923049,5192580,3093544,1032022,228534,35398,4490,346,16,DOFs,4198401,40960,80,-,-,-,-,-,-,nnz,36923049,6505600,2100,-,-,-,-,-,-,(a),Grid,information,on,128,processors,(b),Grid,information,on,1024,processors,Table,10:,Grid,and,subgrid,information,for,different,architectures.,Table,10,contains,the,grid,sizes,along,with,the,number,of,nonzero,elements,per,grid,for,both,AMG,and,the,DD,preconditioner.,It,is,clear,that,the,coarsening,factors,for,DD,are,very,large,compared,to,the,ones,result-,ing,from,AMG.,The,largest,coarsening,factor,for,AMG,is,7.7,while,for,the,multilevel,DD,it,reaches,820,on,128,processors,and,512,on,1024,processors.,The,large,coarsening,factors,obtained,by,the,multilevel,DD,method,allow,us,to,reduce,significantly,the,number,of,levels,while,maintaining,the,efficiency,of,the,preconditioner.,We,also,observe,that,the,DD,preconditioner,produces,grid,matrices,that,are,slightly,denser,than,the,ones,obtained,from,AMG.,This,,in,turn,,has,consequences,on,the,time,required,to,assemble,these,matrices,during,the,setup,phase,and,operate,with,them,during,the,solve,phase.,Figure,3,shows,the,total,time,required,to,solve,the,linear,system,up,to,a,relative,residual,norm,10,−8,using,the,multilevel,DD,and,AMG,preconditioners.,For,each,preconditioner,,the,total,time,is,broken,down,into,the,time,spent,during,the,setup,phase,and,during,the,solve,phase,(the,preconditioned,Krylov,subspace,solver).,Furthermore,,the,setup,time,for,the,DD,preconditioner,is,broken,down,into,two,parts,,the,time,required,to,assemble,the,operators,on,levels,>,1,and,the,rest,of,the,setup,phase.,We,notice,from,Figure,3,that,the,bottleneck,for,scalability,for,both,preconditioners,is,the,setup,phase.,Start-,ing,from,256,processors,(approximately,5000,DOFs,per,processor),,the,setup,phase,for,AMG,struggles,to,scale,and,dominates,the,run,time.,Concerning,the,DD,preconditioner,,the,computation,of,the,matrices,on,levels,>,1,dominates,the,run,time,(73%,of,the,setup,time).,As,expected,,the,rest,of,the,computations,during,the,setup,scale,very,well.,We,give,some,details,of,this,in,Appendix,A,,and,we,will,implement,the,suggested,improvements,in,computing,the,matrices,on,levels,>,1,in,future,work,,which,we,expect,to,yield,a,more,scalable,setup,time.,11,Figure,3:,Strong,scalability,comparison,between,BoomerAMG,and,spectral,multilevel,DD,(HPDDM).,4,Stationary,hyperbolic,problems,As,a,model,for,stationary,hyperbolic,problems,,we,use,the,advection-diffusion,equation.,In,our,examples,,we,consider,the,square,domain,Ω,=,(−1,,1)2,and,utilise,Dirichlet,boundary,conditions,on,the,boundary,Γ,=,∂Ω,so,that,the,problem,is,given,by,−ϵ∇2u,+,w(x),·,∇u,=,0,,u,=,uD,,,x,∈,Ω,,x,∈,Γ,,where,w(x),is,the,so-called,wind,vector,defining,the,advection,field,and,uD,defines,the,values,on,the,boundary,,both,of,which,must,be,specified.,When,the,parameter,ϵ,is,small,,the,problem,is,dominated,by,advection,and,solutions,may,often,exhibit,steep,gradients,(boundary,layers),in,some,parts,of,the,domain.,This,presents,challenges,both,in,discretising,and,solving,such,problems.,The,degree,to,which,the,problem,is,advection-dominated,is,typically,expressed,by,the,Peclet,number,P,=,|w|L,ϵ,,,where,L,is,the,characteristic,length,scale,of,the,domain,Ω,and,|w|,is,a,measure,of,the,size,of,the,wind.,The,larger,the,Peclet,number,P,,,the,more,advection,dominates,and,so,the,more,challenging,the,problem,becomes.,In,our,examples,,the,wind,is,chosen,such,that,|w|,=,O(1),and,the,domain,is,fixed,so,that,by,varying,ϵ,we,vary,P,.,To,discretise,,we,use,high-order,continuous,Galerkin,finite,elements,and,make,use,of,a,sufficiently,fine,mesh,resolution.,When,P,is,large,and,advection,is,dominant,,the,continuous,Galerkin,solution,may,exhibit,spurious,oscillations,if,the,mesh,is,not,sufficiently,refined,to,capture,any,layers,in,the,solution.,This,is,quanti-,fied,by,the,mesh,Peclet,number,P,h,=,P,h,2L,=,|w|h,2ϵ,,,which,should,be,lower,than,the,unity,to,avoid,such,oscillations.,One,way,to,mitigate,this,lack,of,stability,is,to,use,a,streamline,diffusion,discretisation,[21,,15],instead,,which,formulates,a,Petrov–Galerkin,method.,However,,MFEM,does,not,support,bilinear,form,integrators,required,for,the,second,order,streamline,upwind,12,32641282565121024#,Processors10-1100101Time,(s)Perfect,ScalingDD,TotalDD,Set,Up,(PtAP)DD,Set,Up,(else)DD,SolveAMG,TotalAMG,Set,UpAMG,Solve,Petrov–Galerkin.,Nonetheless,,we,employ,a,first,order,streamline,diffusion,stabilization,scheme,when,neces-,sary,,i.e.,,when,P,h,<,1.,An,alternative,approach,is,to,use,a,discontinuous,Galerkin,(DG),method,but,we,shall,not,consider,such,an,approach,in,this,section.,In,the,rest,of,this,section,,we,consider,two,test,cases,corresponding,to,two,different,wind,vectors.,In,each,case,,we,compare,two,variants,of,algebraic,multigrid,preconditioners:,BoomerAMG,and,ℓ-AIR,[26].,The,ℓ-,AIR,approach,is,tailored,to,non-symmetric,problems,such,as,advection-diffusion.,These,preconditioners,are,used,either,directly,or,as,the,low-order,solver,within,a,LOR,preconditioner.,When,LOR,is,employed,,the,matrix,is,partially,assembled,(matrix-free),and,the,algebraic,multigrid,preconditioner,is,set,up,using,the,discretised,PDE,on,the,refined,mesh,with,order,p,=,1,finite,elements.,4.1,Advection–diffusion,with,a,shear,wind,We,first,consider,the,case,of,a,constant,shear,wind,,as,in,Example,6.1.3,of,[15],,where,w(x),=,(−,sin(π/6),,cos(π/6))T,,,and,uD,(x),=,,,,if,x,=,1,,if,y,=,−1,and,0,≤,x,≤,1,,1,1,0,otherwise.,(cid:112),ϵ),,emanating,from,the,discontinuity,of,the,boundary,The,solution,u,features,an,internal,layer,of,width,O,(,data,at,(0,,−1),and,following,the,wind,vector,w,,as,well,as,an,exponential,boundary,layer,of,width,O,(ϵ),along,the,top,of,the,domain.,In,the,results,that,follow,,we,vary,the,Peclet,number,P,by,reducing,the,value,of,ε,from,10,the,robustness,of,the,algebraic,multigrid,and,the,LOR,preconditioners,with,respect,to,P,.,−1,to,10,−4,to,assess,ε,p,Full,LOR,AMG,Tot.,time,1.4,1.9,4.9,10.7,1.3,1.7,4.3,8.4,1.4,1.8,4.8,8.0,3.3,4.1,6.5,12.1,It.,6,7,11,10,5,6,8,7,5,6,6,6,6,7,7,8,It.,×,×,×,37,38,22,54,12,31,34,×,8,40,6,14,×,1,2,4,8,1,2,4,8,1,2,4,8,1,2,4,8,−1,10,−2,10,−3,10,−4,10,ℓ-AIR,Tot.,time,×,×,×,31.7,5.4,5.3,20.2,12.6,5.2,7.5,×,9.8,12.1,3.2,9.8,×,AMG,Tot.,time,2.2,2.3,3.6,5.0,2.0,2.1,3.7,4.0,2.2,2.0,3.6,4.3,3.9,3.9,4.4,4.9,It.,10,12,13,15,8,10,11,12,7,9,10,10,8,9,9,11,It.,×,×,×,×,34,35,27,×,22,21,11,12,19,35,29,17,ℓ-AIR,Tot.,time,×,×,×,×,6.0,5.5,6.1,×,4.1,3.7,3.4,3.8,5.2,9.3,7.7,5.3,Table,11:,Iteration,counts,for,the,2D,advection-diffusion,equation,with,a,shear,wind.,Here,the,mesh,refinement,is,changed,with,p,so,that,each,problem,instance,has,the,same,number,of,DOFs,,namely,67,125,249.,Table,11,presents,the,iteration,count,required,to,a,reach,relative,residual,norm,smaller,than,10,−8,as,well,as,the,total,run,time.,The,polynomial,order,of,the,finite,element,method,is,doubled,starting,from,p,=,1,up,to,p,=,8,and,the,mesh,is,coarsened,correspondingly,so,that,the,number,of,degrees,of,freedom,is,fixed,at,67,125,249.,We,notice,that,for,all,preconditioning,variants,the,iteration,count,is,small,,showing,their,effectiveness.,Although,using,AMG,on,the,full,system,requires,the,fewest,number,of,iterations,,the,increased,computational,cost,in-,curred,by,increasing,the,polynomial,order,prohibits,the,Full-AMG,variant,from,being,the,fastest.,The,variant,13,Figure,4:,Strong,scalability,of,the,advection-diffusion,with,shear,wind,problem,and,LOR,preconditioner.,The,viscosity,parameter,is,ε,=,10,−4.,LOR-AMG,has,slightly,larger,iteration,counts,(at,most,5,extra,iterations,compared,to,Full-AMG),but,requires,the,least,amount,of,time,for,each,order.,Furthermore,,the,total,run,time,increases,only,mildly,when,increasing,the,order,while,the,iteration,count,is,robust,with,respect,to,the,order.,We,also,observe,robustness,with,respect,to,the,Peclet,number,P,as,it,increases,by,decreasing,ε,and,so,making,the,problem,more,advection-dominated.,Figure,4,provides,results,for,a,strong,scalability,test,with,BoomerAMG,and,ℓ-AIR,when,used,as,part,of,a,LOR,preconditioner,for,the,linear,system,arising,from,discretising,the,2D,advection-diffusion,problem,with,a,−4,on,a,Cartesian,grid,of,size,1024,×,1024,with,8th,order,FEM.,shear,wind.,In,particular,,we,use,the,case,ε,=,10,The,number,of,iterations,to,reach,convergence,is,11,for,AMG,and,ranges,between,17,and,20,for,ℓ-AIR.,We,can,observe,that,both,preconditioners,achieve,scalability,for,the,setup,and,solve,phases.,4.2,Advection–diffusion,with,a,recirculating,wind,As,a,second,example,,we,consider,the,case,of,a,recirculating,wind,,as,in,Example,6.1.4,of,[15],,where,and,w(x),=,(cid:161)2y(1,−,x2),,−2x(1,−,y,2)(cid:162)T,,,uD,(x),=,(cid:189),1,if,x,=,1,,0,otherwise.,This,provides,a,simple,model,of,heat,in,a,cavity,where,the,right-hand,wall,is,kept,hot,(uD,=,1),and,the,heat,is,advected,by,a,recirculating,current.,The,discontinuities,in,boundary,data,at,the,right-hand,corners,yield,nearby,boundary,layers,in,the,solution.,14,326412825651210242048#,Processors10-1100101102Time,(s)Perfect,scalingl-AIR,Totall-AIR,Set,Upl-AIR,SolveAMG,TotalAMG,Set,UpAMG,Solve,ε,p,Full,LOR,AMG,Tot.,time,1.4,2.4,4.9,10.9,1.3,1.7,4.6,10.1,1.9,2.9,7.2,13.3,×,5.7,7.4,12.8,It.,6,7,10,10,5,6,9,9,8,9,11,12,×,9,9,9,It.,×,×,×,43,68,38,×,16,47,62,37,13,17,36,17,44,1,2,4,8,1,2,4,8,1,2,4,8,1,2,4,8,−1,10,−2,10,−3,10,−4,10,ℓ-AIR,Tot.,time,×,×,×,36.6,10.3,8.0,×,16.2,8.2,15.0,18.4,13.6,8.0,11.4,10.7,43.6,AMG,Tot.,time,2.3,2.3,3.6,4.5,2.0,2.1,3.5,4.0,2.5,2.6,4.2,4.9,×,4.0,4.4,4.5,It.,10,12,13,15,8,10,11,12,10,12,13,15,×,10,10,11,It.,×,×,×,×,42,42,42,×,15,15,26,18,11,13,15,×,ℓ-AIR,Tot.,time,×,×,×,×,7.4,6.7,8.6,×,3.4,3.1,6.9,5.1,4.4,3.7,4.8,×,Table,12:,Iteration,counts,for,the,2D,advection-diffusion,equation,with,a,recirculating,wind.,Here,the,mesh,re-,finement,is,changed,with,p,so,that,each,problem,instance,has,the,same,number,of,DOFs,,namely,=,67,125,249.,Table,12,contains,the,iteration,count,required,to,reach,a,relative,residual,norm,smaller,than,10,−8,as,well,as,the,total,run,time.,The,setup,is,similar,to,that,for,Table,11,in,the,previous,section,and,we,broadly,see,similar,trends,in,the,results.,Namely,,LOR-AMG,is,the,fastest,approach,for,higher-order,p,and,provides,the,most,robust,solver.,5,Time-dependent,hyperbolic,problems,In,this,section,,we,turn,our,attention,to,time-dependent,hyperbolic,problems,,we,must,consider,how,to,ad-,vance,solutions,in,time.,We,first,discretise,in,space,to,obtain,a,semi-discrete,problem,,turning,the,problem,into,an,ordinary,differential,equation,(ODE),in,time.,To,numerically,solve,the,ODE,,we,use,a,time-stepping,scheme,which,can,either,be,explicit,or,implicit.,Explicit,schemes,update,the,solution,based,solely,on,values,at,previous,time-steps,and,so,are,relatively,cheap,to,use,,however,,they,typically,suffer,from,stability,issues,which,enforce,a,constraint,on,how,large,the,time-step,can,be.,This,is,particularly,true,of,so-called,stiff,ODEs,,which,often,occur,in,physically,relevant,applications.,Since,performing,many,small,time,steps,becomes,costly,and,can,accumulate,errors,,implicit,methods,are,often,favourable,when,high,solution,accuracy,is,required.,In,an,implicit,scheme,,the,solution,at,the,next,time-step,is,computed,through,solving,linear,systems,rather,than,each,degree,of,freedom,only,depending,on,past,values.,This,provides,more,stable,time-stepping,schemes,but,comes,with,the,additional,cost,of,solving,the,associated,linear,systems.,Here,we,consider,the,most,promising,types,of,implicit,time-stepping,schemes.,Most,of,these,approaches,belong,to,a,broad,class,of,ODE,solvers,called,Runge–Kutta,(RK),methods,,which,are,popular,and,include,many,well-known,classical,schemes,such,as,Crank–Nicolson.,Before,continuing,,we,note,that,there,are,two,notions,of,stability,which,can,be,helpful,in,characterising,time-stepping,schemes,based,on,solving,the,stiff,problem,∂t,u(t,),−,ku(t,),=,0,,u(0),=,1,,where,k,∈,C,with,Re(k),<,0.,If,,for,a,fixed,time-step,,the,numerical,scheme,approaches,0,as,t,→,∞,then,the,scheme,is,said,to,be,A-stable.,If,,in,addition,,this,holds,in,a,single,time-step,as,the,time-step,tends,to,infinity,then,the,scheme,is,said,to,be,L-stable.,We,note,that,L-stability,is,a,stronger,property,than,A-stability.,5.1,Implicit,time-stepping,schemes,We,shall,consider,several,classes,of,implicit,time-stepping,methods.,Within,each,class,,there,can,be,many,schemes.,Primarily,they,are,categorised,by,the,order,they,achieve,,stability,properties,(A-stable,or,L-stable),15,and,then,other,identifying,characteristics,,such,as,an,underlying,approach,they,are,based,on.,We,only,consider,approaches,that,solve,over,a,single,time-step,rather,than,multistep,methods,which,,as,described,in,[13],,have,several,drawbacks,such,as,the,fact,that,they,are,more,memory,intensive,due,to,requiring,the,storage,of,the,solution,at,several,past,time-steps,,and,that,there,are,no,implicit,multistep,methods,that,are,A-stable,and,of,order,greater,than,two.,5.1.1,Classical,implicit,schemes,The,most,well-known,time-stepping,schemes,are,classical,methods,,which,typically,have,relatively,low,order.,The,simplest,implicit,scheme,is,the,backward,Euler,method,,which,uses,a,first-order,finite,difference,to,ap-,proximate,the,time,derivative,with,all,other,terms,being,prescribed,at,the,current,time-step,,hence,requiring,the,solution,of,a,linear,system.,Backward,Euler,is,only,first,order,,and,higher,order,schemes,can,also,be,devised.,One,of,the,most,popular,classical,schemes,is,the,second,order,Crank–Nicolson,method,,which,can,be,derived,from,the,trapezium,rule.,We,will,consider,a,similar,second-order,method,,known,as,the,implicit,midpoint,rule,,which,is,a,geometric,integrator,and,within,the,family,of,Gauss–Legendre,Runge–Kutta,methods.,Note,that,both,backwards,Euler,and,the,implicit,midpoint,rule,are,one-stage,RK,methods:,we,go,straight,to,the,solution,at,the,next,time,step,rather,than,having,internal,stages,to,compute,first.,To,achieve,higher,order,,the,RK,methods,we,now,discuss,utilise,more,than,one,stage.,5.1.2,SDIRK,schemes,Diagonally,implicit,Runge–Kutta,(DIRK),methods,,see,[23],,are,RK,methods,that,have,a,lower-triangular,Runge–,Kutta,matrix,,which,describes,the,set,of,s,stages,to,be,performed.,Such,methods,are,attractive,since,this,reduces,the,implicit,solves,to,a,series,of,n,×,n,systems,rather,than,a,larger,ns,×,ns,system.,In,particular,,we,consider,singly,diagonally,implicit,Runge–Kutta,(SDIRK),methods,which,,in,addition,,have,an,RK,matrix,with,a,constant,diagonal.,These,methods,are,typically,differentiated,by,their,order,,whether,they,are,A-stable,or,L-stable,,and,the,number,of,stages,s,which,they,use.,5.1.3,Fully,implicit,RK,schemes,As,noted,in,[23,,13],,while,DIRK,methods,are,enticing,due,to,their,simplified,structure,,there,are,a,number,of,disadvantages,of,such,schemes,,especially,for,stiff,problems,where,they,may,suffer,from,so-called,order,reduc-,tion,,resulting,in,lower,accuracy,than,their,formal,order,would,suggest.,Fully,implicit,Runge–Kutta,methods,[34,,33],,which,utilise,the,full,RK,matrix,,can,overcome,these,limitations.,However,,due,to,the,challenging,lin-,ear,systems,to,be,solved,they,have,generally,been,considered,impractical.,Nonetheless,,with,recent,advances,,such,schemes,look,more,promising,when,the,linear,algebra,is,treated,carefully,and,their,ability,to,provide,a,high-order,robust,solver,is,worthy,of,investigation.,Common,families,of,fully,implicit,RK,methods,include,Gauss–Legendre,,Gauss–Lobatto,and,Gauss–Radau,schemes.,Such,families,also,encompass,classical,techniques,such,as,backward,Euler,(equivalent,to,RadauIIA),and,Crank–Nicolson,(equivalent,to,LobattoIIIA).,In,particular,,we,consider,methods,in,the,Gauss–Legendre,family,,denoted,IRK-Gauss,,and,the,Gauss-Lobatto,family,,denoted,IRK-LobIIIC.,5.1.4,IMEX,schemes,The,main,motivation,behind,implicit-explicit,(IMEX),schemes,[8],is,that,the,time,scale,of,different,terms,in,the,underlying,equation,may,be,quite,different;,some,terms,may,be,well,approximated,by,an,explicit,time-,stepping,scheme,,while,others,require,an,implicit,method,for,stability,and,accuracy.,Hence,,if,we,split,such,terms,and,treat,one,set,with,an,explicit,method,and,the,other,set,with,an,implicit,method,,then,we,may,hope,to,gain,the,best,of,each,method.,However,,the,split,is,not,always,clear,and,can,be,somewhat,problem,dependent,,even,for,the,same,PDE,model,,as,it,may,have,differing,physics,regimes.,IMEX,methods,may,also,suffer,from,accuracy,issues,and,leak,stiffness.,While,they,can,be,applied,successfully,,see,examples,discussed,in,[13],,for,these,reasons,we,have,chosen,not,to,pursue,these,schemes,for,the,test,problems,we,consider,here.,We,feel,that,they,are,best,when,targeted,to,a,particular,equation,or,model.,We,envisage,that,in,more,challenging,applications,and,multiphysics,simulations,,IMEX,schemes,should,be,investigated,when,a,specific,set,of,equations,and,parameter,regimes,has,been,established.,16,5.2,Time-dependent,advection,problem,To,start,our,study,of,time-dependent,hyperbolic,problems,,we,examine,the,basic,case,of,an,advection,equation,on,the,square,domain,Ω,=,(−1,,1)2,,given,by,∂t,+,w(x),·,∇u,=,0,,u(0,,x),=,u0(x),,x,∈,Ω,,t,∈,(0,,T,],,x,∈,Ω,,t,=,0,,(5a),(5b),with,periodic,boundary,conditions,in,space.,We,consider,the,simple,example,of,a,constant,advection,field,w(x),=,(1,,0)T,so,that,the,exact,solution,consists,of,the,initial,condition,u0(x),travelling,to,the,right,and,,due,to,the,periodic,domain,,returning,to,the,initial,state,when,t,is,a,multiple,of,2.,That,is,,u(2,j,,,x),=,u0(x),for,all,j,∈,N,for,the,exact,solution.,This,allows,us,to,straightforwardly,measure,the,error,in,the,time-stepping,solution,at,T,=,2,j,by,comparing,it,with,the,(discrete),initial,condition.,To,implement,this,problem,,we,modify,MFEM,Example,9,[7],and,use,the,initial,condition,u0(x),=,1,16,erfc(10(x,−,0.45)),erfc(−10(x,+,0.45)),erfc(10(y,−,0.25)),erfc(−10(y,+,0.245)),,where,erfc,is,the,complementary,error,function.,This,represents,a,“bump”,in,the,centre,of,the,domain,which,is,then,advected,to,the,right.,To,begin,our,exploration,of,time-stepping,schemes,,we,consider,the,classical,approaches,of,the,backwards,Euler,scheme,and,the,Implicit,Midpoint,Rule,along,with,several,higher,order,methods,in,the,family,of,implicit,Runge–Kutta,methods.,In,particular,,we,consider,singly,diagonally,implicit,Runge–Kutta,(SDIRK),methods,as,given,in,Table,13,(see,,e.g.,,[16,,Appendix,B],for,further,details),,which,include,the,aforementioned,classical,approaches.,Scheme,Backwards,Euler,SDIRK22,SDIRK33,Implicit,Midpoint,Rule,SDIRK23,SDIRK24,Stability,Order,L,L,L,A,A,A,1,2,3,2,3,4,Table,13:,Time-stepping,schemes,used,for,the,advection,problem,tests.,To,discretise,in,space,,we,use,discontinuous,Galerkin,(DG),finite,elements,of,order,equal,to,the,order,of,the,time-stepping,scheme,used,,namely,in,the,range,p,∈,[1,,4].,To,precondition,the,linear,systems,that,arise,,we,consider,two,approaches:,a,simple,block,ILU,preconditioner,and,the,specialist,ℓ-AIR,preconditioner,tailored,for,non-symmetric,problems,such,as,advection.,The,time-step,∆t,is,chosen,of,the,same,order,as,∆x,,in,partic-,−r,with,r,being,a,mesh,refinement,parameter,for,the,regular,Cartesian,ular,given,by,∆t,=,3,2,mesh,used.,∆x,,where,∆x,=,1,6,2,In,Table,14,we,tabulate,for,the,L-stable,time-stepping,schemes,the,total,time,to,solution,,the,L2-error,in,the,solution,at,time,T,∈,{2,,8},and,the,average,number,of,iterations,for,each,linear,system,solved,,each,for,a,sequence,of,mesh,refinements.,Table,15,provides,similar,results,for,the,A-stable,methods.,We,first,note,that,−8,in,a,single,iteration,and,,aside,from,the,smallest,ℓ-AIR,always,converges,to,within,the,relative,tolerance,of,10,problem,sizes,,gives,the,fastest,time,to,solution,,with,block,ILU,typically,requiring,two,or,three,iterations.,In,terms,of,the,error,,the,higher,order,schemes,typically,give,smaller,errors,for,the,same,mesh,refinement,,as,expected,,but,also,yield,a,larger,run,time.,Further,,the,A-stable,methods,are,faster,than,their,equivalent,order,L-stable,method,but,are,slightly,less,accurate.,If,we,consider,which,scheme,is,the,fastest,to,reach,a,given,error,,however,,we,see,that,the,extra,work,re-,quired,in,the,fourth,order,method,SDIRK34,is,not,worth,the,effort,in,terms,of,the,run,time,,with,SDIRK33,giving,a,similar,error,in,the,solution,more,cheaply.,Conversely,,the,third,order,SDIRK23,scheme,manages,to,achieve,an,order,of,magnitude,smaller,error,on,the,finest,meshes,used,compared,to,the,second,order,SDIRK22,scheme,and,so,can,make,the,extra,work,worthwhile.,If,we,consider,which,method,attains,an,error,below,,for,−3,at,T,=,2,in,the,fastest,time,we,find,that,SDIRK23,is,quickest,at,11.65s,followed,by,SDIRK33,at,example,,10,16.97s.,If,we,change,to,consider,a,final,time,T,=,8,,however,,then,we,see,that,SDIRK33,is,quickest,at,65.93s,followed,by,SDIRK34,at,147.43s.,Thus,,in,this,case,,we,see,that,the,extra,stability,of,SDIRK33,or,higher,order,of,SDIRK34,can,pay,off.,As,a,general,rule,,the,higher,order,methods,are,more,efficient,to,attain,a,given,accuracy,17,compared,to,lower,order,methods,such,as,the,classical,Backwards,Euler,scheme,or,Implicit,Midpoint,Rule,,but,this,should,be,balanced,by,the,consideration,of,stability,,especially,when,solving,over,a,long,time,interval.,On,balance,,the,L-stable,method,SDIRK33,performs,best,over,the,range,of,tests,here,,being,favourable,for,larger,T,and,only,marginally,slower,than,SDIRK23,for,small,T,.,T,∆t,2,8,1/32,1/64,1/128,1/256,1/32,1/64,1/128,1/256,r,3,4,5,6,3,4,5,6,Prec.,ILU,ℓ-AIR,ILU,ℓ-AIR,ILU,ℓ-AIR,ILU,ℓ-AIR,ILU,ℓ-AIR,ILU,ℓ-AIR,ILU,ℓ-AIR,ILU,ℓ-AIR,Backwards,Euler,(p,=,1),Error,Time,1.6e-01,0.04,1.6e-01,0.09,1.1e-01,0.16,1.1e-01,0.15,7.6e-02,1.86,7.6e-02,0.79,4.8e-02,9.75,4.8e-02,6.52,It.,3.8,1,3.5,1,3,1,2.4,1,0.09,0.13,0.58,0.43,5.08,2.89,39.46,25.28,3.1e-01,3.1e-01,2.2e-01,2.2e-01,1.6e-01,1.6e-01,1.1e-01,1.1e-01,3.2,1,3.8,1,3,1,2.4,1,SDIRK22,(p,=,2),SDIRK33,(p,=,3),Time,0.09,0.13,0.95,0.64,7.43,4.36,60.47,38.25,0.28,0.29,3.61,2.14,29.00,16.87,239.12,150.11,Error,2.2e-02,2.2e-02,6.1e-03,6.1e-03,1.6e-03,1.6e-03,3.9e-04,3.9e-04,6.5e-02,6.5e-02,2.3e-02,2.3e-02,6.2e-03,6.2e-03,1.6e-03,1.6e-03,It.,2,1,2.3,1,2,1,2,1,2,1,2.3,1,2,1,2,1,Time,0.43,0.35,4.60,2.19,28.29,16.97,219.91,141.42,1.53,1.09,17.41,8.05,111.16,65.93,869.66,558.15,Error,7.6e-03,7.6e-03,1.3e-03,1.3e-03,1.7e-04,1.7e-04,2.2e-05,2.2e-05,2.1e-02,2.1e-02,4.5e-03,4.5e-03,6.7e-04,6.7e-04,8.7e-05,8.7e-05,It.,2,1,2.9,1,2,1,2,1,2,1,2.9,1,2,1,2,1,Table,14:,Total,time,to,solution,,error,in,the,solution,and,average,iteration,counts,for,the,preconditioned,linear,system,solves,for,L-stable,time-stepping,schemes,on,the,advection,problem,with,final,time,T,,,time-step,∆t,and,mesh,refinement,parameter,r,.,The,computations,use,64,processors.,T,∆t,2,8,1/32,1/64,1/128,1/256,1/32,1/64,1/128,1/256,r,3,4,5,6,3,4,5,6,Prec.,ILU,ℓ-AIR,ILU,ℓ-AIR,ILU,ℓ-AIR,ILU,ℓ-AIR,ILU,ℓ-AIR,ILU,ℓ-AIR,ILU,ℓ-AIR,ILU,ℓ-AIR,Implicit,Midpoint,(p,=,2),Error,Time,4.2e-02,0.07,4.2e-02,0.08,1.2e-02,0.61,1.2e-02,0.35,3.2e-03,3.77,3.2e-03,2.28,8.1e-04,30.26,8.1e-04,19.78,It.,3,1,3,1,2,1,2,1,0.20,0.17,2.22,1.14,14.52,8.50,120.20,76.00,1.0e-01,1.0e-01,4.3e-02,4.3e-02,1.3e-02,1.3e-02,3.2e-03,3.2e-03,3,1,3,1,2,1,2,1,SDIRK23,(p,=,3),SDIRK34,(p,=,4),Time,0.38,0.28,3.14,1.57,23.24,11.65,146.59,96.73,1.37,0.79,11.92,5.66,90.56,44.91,581.34,380.21,Error,1.8e-02,1.8e-02,3.9e-03,3.9e-03,5.8e-04,5.8e-04,7.5e-05,7.5e-05,4.1e-02,4.1e-02,1.2e-02,1.2e-02,2.2e-03,2.2e-03,3.0e-04,3.0e-04,It.,3,1,3,1,2.7,1,2,1,3,1,3,1,2.7,1,2,1,Time,1.57,0.72,11.54,4.91,79.97,38.00,559.75,311.01,5.99,2.42,45.02,18.50,314.59,147.43,2230.29,1231.55,Error,1.5e-02,1.5e-02,2.4e-03,2.4e-03,2.1e-04,2.1e-04,1.4e-05,1.4e-05,3.2e-02,3.2e-02,7.5e-03,7.5e-03,8.1e-04,8.1e-04,5.6e-05,5.6e-05,It.,4,1,3.5,1,3,1,2.5,1,4,1,3.5,1,3,1,2.5,1,Table,15:,Total,time,to,solution,,error,in,the,solution,and,average,iteration,counts,for,the,preconditioned,linear,system,solves,for,A-stable,time-stepping,schemes,on,the,advection,problem,with,final,time,T,,,time-step,∆t,and,mesh,refinement,parameter,r,.,The,computations,use,64,processors.,18,5.3,Time-dependent,advection–diffusion,problem,We,consider,in,this,section,the,scalar,advection-diffusion,equation,,∂t,u,+,α,·,∇,f,(u),=,∇,·,(µ,·,∇u),+,s(t,,,x,,y),,u(0,,x,,y),=,u0(x,,y),(t,,,x,,y),∈,(0,,T,],×,(−1,,1)2,(x,,y),∈,(−1,,1)2,(6a),(6b),for,constant,vectors,α,=,(0.85,,1.0),⊤,and,µ,=,(0.3,,0.25),⊤,and,final,time,T,>,0.,An,initial,condition,is,given,as,u(0,,x,,y),=,u0(x,,y),=,(cid:179),sin(,π,2,(x,−,1)),(cid:180)4,(cid:179),sin(,(y,−,1)),(cid:180)4,,,π,2,and,periodic,spatial,boundaries,are,used.,Two,options,for,the,advection,flux,f,are,considered,,a,linear,function,f,(u),=,u,that,yields,a,linear,PDE,,and,a,quadratic,function,f,(u),=,u2,that,yields,a,nonlinear,PDE.,Test,problems,with,manufactured,solutions,are,implemented,to,facilitate,convergence,studies.,The,manufactured,solution,is,given,as:,u(t,,,x,,y),=,(cid:179),sin,(x,−,1,−,α1t,),(cid:180),sin,(cid:161)y,−,1,−,α2t,(cid:162)(cid:180)(cid:180)4,−(µ1+µ2)t,,,e,(cid:179),π,2,(cid:179),π,2,so,that,the,test,problem,represents,a,propagation,and,dissipation,of,the,initial,condition,with,wave,speed,α,and,diffusivity,µ.,The,solution-independent,term,s,is,computed,correspondingly,to,the,exact,solution,and,the,function,f,.,We,use,am,arbitrarily,high-order,finite,difference,method,in,2D.,In,all,of,the,experiments,,we,use,the,pre-,conditioner,BoomerAMG,for,solving,the,arising,linear,systems.,As,the,actual,implementation,of,the,fully,im-,plicit,RK,methods,does,not,support,using,LOR,,we,employ,BoomerAMG,on,the,fully,assembled,system.,In,order,to,compare,the,different,families,of,time-stepping,schemes,using,different,orders,,we,choose,to,adjust,the,spatial,discretisation,so,that,all,methods,end,up,with,a,fixed,number,of,DOF,per,stage.,Therefore,,we,had,only,selected,the,time-stepping,schemes,having,one,of,the,orders,{2,,4,,8}.,That,is,,we,do,not,include,results,for,the,Gauss-Radau,family.,5.3.1,The,linear,case,Table,16,presents,the,weighted,average,Krylov,iteration,count,per,time,step,to,solve,(using,a,weight,of,1,for,a,1,×,1,block,diagonal,system,corresponding,to,a,real,eigenvalue,of,the,Butcher,table,matrix,,and,a,weight,of,2,for,2,×,2,block,diagonal,system,corresponding,to,the,pair,of,complex,eigenvalues,of,the,Butcher,table,matrix),,the,total,run,time,,and,the,L2-error,norm.,The,results,in,the,table,demonstrate,that,combining,high-,order,discretisation,in,space,and,time,achieves,its,most,benefit,when,the,time,stepping,scheme,is,fully,implicit.,Indeed,,for,orders,2,,4,,and,8,,fully,implicit,RK,methods,(IRK-Gauss,and,IRK-LobIIIC),achieve,the,best,accuracy.,Furthermore,,the,IRK-Gauss,method,is,the,fastest,for,all,orders.,Scheme,SDIRK,SDIRK,SDIRK,IRK-Gauss,IRK-Gauss,IRK-Gauss,IRK-LobIIIC,IRK-LobIIIC,IRK-LobIIIC,Stability,Order,A,L,L,L,L,L,L,L,L,4,2,4,2,4,8,2,4,8,Avg.,It.,21,14,35,7,18,44,32,31,58,Error,2.83692e-07,4.65815e-06,2.72948e-09,8.38129e-06,2.64616e-09,1.35212e-11,1.37162e-05,4.03505e-09,7.00163e-12,Tot.,time,26.688,68.837,41.372,35.141,18.690,13.814,88.737,31.775,17.723,Table,16:,Time,window,=,2,seconds.,L2-error,norm,and,total,time,comparison,between,different,time-stepping,schemes.,Linear,advection–diffusion,test,case,discretised,with,higher,order,finite,difference,in,space,(order,equal,to,the,order,of,the,time,stepping,scheme).,The,time,step,is,chosen,as,∆t,=,2h,,where,h,is,the,mesh,size.,The,number,of,DOFs,in,space,is,the,same,for,all,cases,and,equal,4,,194,,304.,All,tests,run,on,64,processors.,Table,17,presents,a,scalability,study,of,the,fully,implicit,RK,method,IRK-Gauss,of,order,8,applied,to,the,lin-,ear,time-dependent,advection-diffusion,equation.,The,grid,size,is,1024,×,1024.,The,size,of,the,problem,is,fixed,and,the,number,of,processors,is,quadrupled,starting,from,16,up,to,1024.,Although,the,number,of,iterations,is,19,#,Processors,Tot.,time,Gain,factor,Avg.,It.,16,1432.4,1,42,64,486.0,2.95,42,256,143.1,10.0,42,1024,84.7,16.9,44,Table,17:,Number,of,time,steps,=,512.Total,time,is,reported,for,the,strong,scalability,test,study,using,IRK-,Gauss,(L-stable),of,order,8.,The,linear,problem,is,discretised,on,a,grid,of,size,1024×1024,with,the,order,8,finite,difference,method.,The,number,of,processors,increased,by,a,factor,of,4.,The,gain,factor,is,the,proportion,of,the,total,time,on,P,processors,divided,by,the,total,time,on,16,processors.,The,L2-error,=,6.2,×,10,−12.,robust,with,respect,to,the,number,of,processors,,the,scalability,is,not,perfect;,the,most,probable,reason,is,the,overhead,incurred,by,assembling,the,full,matrix.,As,mentioned,earlier,,the,current,implementation,of,the,fully,implicit,RK,method,does,not,allow,the,use,of,a,LOR,preconditioner.,We,expect,that,scalability,can,be,achieved,once,a,matrix-free,method,is,used,and,the,LOR,preconditioner,is,combined,with,AMG.,We,have,come,to,this,conclusion,for,two,reasons.,First,,in,Section,4,we,demonstrated,that,the,LOR-AMG,preconditioner,scales,very,well,for,stationary,problems.,Second,,in,Table,18,we,give,the,results,of,a,scalability,study,of,the,IRK-Gauss,method,of,order,8,,but,using,a,discretisation,in,space,of,order,2;,in,this,regime,the,assembly,of,the,matrix,and,the,set,up,of,the,preconditioner,do,not,suffer,from,the,overhead,of,the,high-order,discretisation,in,space,,and,the,gain,factor,demonstrates,this,accordingly.,#,Processors,Tot.,time,Gain,factor,Avg.,It.,16,469.1,1,38,64,146.7,3.2,40,256,33.3,14.0,40,1024,9.3,50.4,42,Table,18:,Number,of,time,steps,=,21.,Total,time,is,reported,for,the,strong,scalability,test,study,using,IRK-Gauss,(L-stable),of,order,8.,The,linear,problem,is,discretised,on,a,grid,of,size,4096,×,4096,with,the,order,2,finite,difference,method.,The,number,of,processors,increased,by,a,factor,of,4.,The,gain,factor,is,the,proportion,of,the,total,time,on,P,processors,divided,by,the,total,time,on,16,processors.,The,L2-error,=,3.3,×,10,−8.,5.3.2,The,nonlinear,case,Table,19,presents,the,weighted,average,Krylov,iteration,count,per,time,step,to,solve,(using,a,weight,of,1,for,1,×,1,block,diagonal,system,corresponding,to,a,real,eigenvalue,of,the,Butcher,table,matrix,,and,a,weight,of,2,for,2,×,2,block,diagonal,system,corresponding,to,the,pair,of,complex,eigenvalues,of,the,Butcher,table,matrix),,the,total,run,time,,and,the,L2-error,norm,for,our,non-linear,test,case.,As,for,the,linear,case,,the,results,in,the,table,demonstrate,that,combining,high-order,discretisation,in,space,and,time,achieves,its,most,benefit,when,the,time,stepping,scheme,is,fully,implicit.,The,results,here,follow,the,same,pattern,as,in,the,linear,case:,for,orders,2,,4,,and,8,,fully,implicit,RK,methods,(IRK-Gauss,and,IRK-LobIIIC),achieve,the,best,accuracy,,and,the,IRK-Gauss,method,is,again,the,fastest,for,all,orders.,Table,20,contains,a,scalability,study,of,the,fully,implicit,RK,method,IRK-Gauss,of,order,8,applied,to,the,nonlinear,time-dependent,advection-diffusion,equation.,The,grid,size,is,1024,×,1024.,The,size,of,the,problem,is,fixed,and,the,number,of,processors,is,quadrupled,starting,from,16,up,to,1024.,Although,the,number,of,iterations,is,robust,with,respect,to,the,number,of,processors,,the,scalability,is,not,perfect.,As,with,the,linear,case,,the,reason,for,not,achieving,perfect,scalability,is,probably,due,to,the,overhead,incurred,by,assembling,the,full,matrix,,and,we,expect,that,scalability,can,be,achieved,once,a,matrix-,free,method,is,used,and,the,LOR,preconditioner,is,combined,with,AMG.,We,again,run,perform,a,scalability,study,of,the,IRK-Gauss,method,of,order,8,,but,using,a,discretisation,in,space,of,order,2,,giving,the,results,in,Table,21.,As,we,saw,in,the,linear,case,,the,scalability,is,more,favourable,in,this,regime,,where,the,assembly,of,the,matrix,and,the,set,up,of,the,preconditioner,do,not,suffer,from,the,overhead,of,the,high-order,discretisation,in,space.,20,Scheme,SDIRK,SDIRK,SDIRK,IRK-Gauss,IRK-Gauss,IRK-Gauss,IRK-LobIIIC,IRK-LobIIIC,IRK-LobIIIC,Stability,Order,A,L,L,L,L,L,L,L,L,4,2,4,2,4,8,2,4,8,Avg,Newton,It.,2.1,2,2,2,2,2.6,2,2,2.7,Avg.,Krylov,It.,41,24,64,12,34,94,40,57,132,Error,1.74236e-07,3.02704e-06,1.96809e-09,5.63955e-06,2.24606e-09,7.59273e-09,9.63594e-06,2.73404e-09,7.34143e-09,Tot.,time,36.2014,94.3833,53.5953,47.8599,25.3041,18.9245,130.601,44.7401,27.0011,Table,19:,Time,window,=,2,seconds.,L2-error,norm,and,total,time,comparison,between,different,time-stepping,schemes.,Nonlinear,advection–diffusion,test,case,discretised,with,higher,order,finite,difference,in,space,(order,equal,to,the,order,of,the,time,stepping,scheme).,The,time,step,is,chosen,as,∆t,=,2h,,where,h,is,the,mesh,size.,The,number,of,DOFs,in,space,is,the,same,for,all,cases,and,equal,4,,194,,304.,All,tests,run,on,64,processors.,#,Processors,Tot.,time,Gain,factor,Avg.,It.,16,2069.2,1,72,64,722.2,2.9,74,256,199.3,10.4,72,1024,113.8,18.2,78,Table,20:,Number,of,time,steps,=,512.,Total,time,is,reported,for,the,strong,scalability,test,study,using,IRK-,Gauss,(L-stable),of,order,8.,The,nonlinear,problem,is,discretised,on,a,grid,of,size,1024,×,1024.,The,number,of,processors,increased,by,a,factor,of,4.,The,gain,factor,is,the,proportion,of,the,total,time,on,P,processors,divided,−11.,The,number,of,Newton,iterations,per,time,by,the,total,time,on,16,processors.,The,L2-error,norm,is,3.1,×,10,step,is,2.,#,Processors,Tot.,time,Gain,factor,Avg.,It.,16,665.5,1,68,64,220.5,3.0,70,256,49.6,13.4,70,1024,13.8,48.2,72,Table,21:,Number,of,time,steps,=,21.,Total,time,is,reported,for,the,strong,scalability,test,study,using,IRK-Gauss,(L-stable),of,order,8.,The,nonlinear,problem,is,discretised,on,a,grid,of,size,4096,×,4096,with,the,order,2,finite,difference,method.,The,number,of,processors,increased,by,a,factor,of,4.,The,gain,factor,is,the,proportion,of,the,total,time,on,P,processors,divided,by,the,total,time,on,16,processors.,The,L2-error,=,4.8,×,10,−8.,6,Conclusions,In,this,report,,we,presented,a,study,around,preconditioning,linear,systems,arising,from,high-order,discreti-,sation,(in,space,and,time),of,stationary,and,time-dependent,PDEs.,Through,the,extensive,numerical,experi-,ments,we,performed,,we,observed,the,following:,regarding,spatial,discretisation,,recent,techniques,for,matrix-,free,preconditioning,are,effective,,efficient,,and,scalable.,The,p-MG,method,is,the,most,effective,for,isotropic,diffusion,equations.,However,,once,anisotropy,is,present,either,in,the,mesh,or,in,the,PDE,coefficient,,p-MG,struggles.,The,LOR,preconditioner,combined,with,any,available,efficient,preconditioner,for,linear,systems,arising,from,low-order,discretised,PDEs,proved,to,be,scalable,and,efficient,for,highly,anisotropic,diffusion,equations,and,advection-diffusion,ones.,Concerning,temporal,discretisation,,we,observed,that,high-order,fully,implicit,time-stepping,schemes,yield,high,accuracy,when,combined,with,high-order,spatial,discretisation.,The,Gauss-Legendre,family,with,orders,2,,4,and,8,was,the,most,efficient,in,terms,of,run,time,and,accuracy,when,compared,to,other,time-stepping,families,with,the,same,order.,21,Summary,•,The,most,promising,method,for,solving,matrices,from,high,order,elliptic,PDEs,is,to,precondition,with,a,low,order,finite,element,operator:,low-order,solvers,have,become,robust,enough,to,solve,the,resulting,(challenging),linear,systems[29].,•,Multigrid,and,Domain,Decomposition-based,preconditioners,are,the,leading,contenders,for,per-,formant,parallel,iterative,linear,solvers,on,anisotropic,elliptic,problems,of,low,order.,•,There,are,several,excellent,algebraic,multigrid,packages,available,that,are,highly,parallel.,However,,to,get,good,performance,,it,is,vital,to,tune,the,parameters,for,a,given,problem.,A,well-implemented,geometric,multigrid,method,would,give,superior,performance,to,an,algebraic,multigrid.,•,The,HPDDM,domain,decomposition,method,,based,on,the,GenEO,method,[35],,seemed,promising,with,a,good,trade-off,between,performance,and,ease,of,setup;,the,recent,development,of,fully,al-,gebraic,variations,[2,,3],make,this,even,more,the,case.,However,,implementation,improvements,for,these,methods,are,needed,for,them,to,compete,against,highly-tuned,multigrid,implementations.,•,Higher,order,fully,implicit,time-stepping,schemes,show,spectacular,performance,in,terms,of,run,time,and,accuracy,when,compared,to,other,higher-order,time-stepping,schemes.,References,[1],H.,Al,Daas,,L.,Grigori,,P.,Jolivet,,and,P.-H.,Tournier.,A,multilevel,Schwarz,preconditioner,based,on,a,hi-,erarchy,of,robust,coarse,spaces.,SIAM,Journal,on,Scientific,Computing,,43(3):A1907–A1928,,2021/07/01,2021.,[2],H.,Al,Daas,,P.,Jolivet,,and,T.,Rees.,Efficient,algebraic,two-level,schwarz,preconditioner,for,sparse,matrices,,2022.,[3],H.,Al,Daas,,P.,Jolivet,,and,J.,A.,Scott.,A,robust,algebraic,domain,decomposition,preconditioner,for,sparse,normal,equations,,2022.,In,press.,preprint:https://arxiv.org/abs/2107.09006.,[4],V.,Alexandrov,and,O.,A.,Esquivel-Flores.,Towards,Monte,Carlo,preconditioning,approach,and,hybrid,Monte,Carlo,algorithms,for,Matrix,Computations.,Computers,&,Mathematics,with,Applications,,70(11):,2709–2718,,Dec.,2015.,ISSN,0898-1221.,doi:,10.1016/j.camwa.2015.08.035.,[5],V.,Alexandrov,,A.,Lebedev,,E.,Sahin,,and,S.,Thorne.,Linear,systems,of,equations,and,preconditioners,relat-,ing,to,the,NEPTUNE,Programme.,NEPTUNE,Technical,Report,2047353-TN-04,,CCFE,Culham,,Apr.,2021.,[6],V.,N.,Alexandrov.,Efficient,parallel,Monte,Carlo,methods,for,matrix,computations.,Mathematics,and,computers,in,Simulation,,47(2-5):113–122,,1998.,[7],R.,Anderson,,J.,Andrej,,A.,Barker,,J.,Bramwell,,J.-S.,Camier,,J.,C.,V.,Dobrev,,Y.,Dudouit,,A.,Fisher,,T.,Kolev,,W.,Pazner,,M.,Stowell,,V.,Tomov,,I.,Akkerman,,J.,Dahm,,D.,Medina,,and,S.,Zampini.,MFEM:,A,modular,finite,element,methods,library.,Computers,&,Mathematics,with,Applications,,81:42–74,,2021.,doi:,10.,1016/j.camwa.2020.06.009.,[8],U.,M.,Ascher,,S.,J.,Ruuth,,and,R.,J.,Spiteri.,Implicit-Explicit,Runge-Kutta,methods,for,time-dependent,partial,differential,equations.,Applied,Numerical,Mathematics,,25(2):151–167,,Nov.,1997.,ISSN,0168-9274.,doi:,10.1016/S0168-9274(97)00056-1.,[9],S.,Balay,,S.,Abhyankar,,M.,Adams,,J.,Brown,,P.,Brune,,K.,Buschelman,,L.,Dalcin,,A.,Dener,,V.,Eijkhout,,W.,Gropp,,D.,Karpeyev,,D.,Kaushik,,M.,Knepley,,D.,May,,L.,Curfman,McInnes,,R.,Mills,,T.,Munson,,K.,Rupp,,P.,Sanan,,B.,Smith,,S.,Zampini,,H.,Zhang,,and,H.,Zhang.,PETSc,Users,Manual.,2019.,[10],W.,L.,Briggs,,V.,E.,Henson,,and,S.,F.,McCormick.,A,multigrid,tutorial.,SIAM,,second,edition,,2000.,ISBN,978-0-898714-62-3.,22,[11],P.,D.,Brubeck,and,P.,E.,Farrell.,A,scalable,and,robust,vertex-star,relaxation,for,high-order,FEM,,2021.,[12],C.,Canuto,,M.,Y.,Hussaini,,A.,Quarteroni,,and,T.,A.,Zang.,Spectral,Methods:,Evolution,to,Complex,Ge-,ometries,and,Applications,to,Fluid,Dynamics.,Springer,Science,&,Business,Media,,2007.,doi:,10.1007/,978-3-540-30728-0.,[13],H.,A.,Daas,,T.,Rees,,E.,Sahin,,and,H.,S.,Thorne.,A,Review,of,Time,Stepping,Techniques,and,Preconditioning,for,Hyperbolic,and,Anisotropic,Elliptic,Problems.,Technical,Report,2060049-TN-01,,UKAEA,,2022.,[14],V.,Dolean,,P.,Jolivet,,and,F.,Nataf.,An,Introduction,to,Domain,Decomposition,Methods:,Algorithms,,Theory,,and,Parallel,Implementation.,SIAM,,Dec.,2015.,ISBN,978-1-61197-405-8.,[15],H.,Elman,,D.,Silvester,,and,A.,Wathen.,Finite,Elements,and,Fast,Iterative,Solvers:,With,Applications,in,Incompressible,Fluid,Dynamics.,Numerical,Mathematics,and,Scientific,Computation.,Oxford,University,Press,,Oxford,,second,edition,,2014.,ISBN,0-19-967880-X.,[16],S.,Friedhoff,and,B.,S.,Southworth.,On,“optimal”,h-independent,convergence,of,parareal,and,multigrid-,reduction-in-time,using,runge-kutta,time,integration.,Numerical,Linear,Algebra,with,Applications,,28(3):,e2301,,2021.,[17],A.,Greenbaum.,Iterative,Methods,for,Solving,Linear,Systems.,SIAM,,Philadelphia,,1997.,[18],M.,J.,Grote,and,T.,Huckle.,Parallel,preconditioning,with,sparse,approximate,inverses.,SIAM,Journal,on,Scientific,Computing,,18(3):838–853,,1997.,[19],V.,E.,Henson,and,U.,M.,Yang.,BoomerAMG:,A,parallel,algebraic,multigrid,solver,and,preconditioner.,Ap-,plied,Numerical,Mathematics,,41(1):155–177,,Apr.,2002.,ISSN,0168-9274.,doi:,10.1016/S0168-9274(01),00115-5.,[20],M.,R.,Hestenes,and,E.,Stiefel.,Methods,of,Conjugate,Gradients,for,Solving,Linear,Systems.,J.,Res.,Nat.,Bur.,Stand.,,49:409–436,,1952.,[21],T.,J.,R.,Hughes,and,A.,N.,Brooks.,A,multi-dimensional,upwind,scheme,with,no,crosswind,diffusion.,In,T.,J.,R.,Hughes,,editor,,Finite,Element,Methods,for,Convection,Dominated,Flows,,volume,34,of,AMD,,pages,19–35.,ASME,,New,York,,1979.,[22],P.,Jolivet,,J.,E.,Roman,,and,S.,Zampini.,KSPHPDDM,and,PCHPDDM:,Extending,PETSc,with,advanced,Krylov,methods,and,robust,multilevel,overlapping,Schwarz,preconditioners.,Computers,&,Mathematics,with,Applications,,84:277–295,,Feb.,2021.,ISSN,0898-1221.,doi:,10.1016/j.camwa.2021.01.003.,[23],C.,A.,Kennedy,and,M.,H.,Carpenter.,Diagonally,Implicit,Runge-Kutta,Methods,for,Ordinary,Differential,Equations.,A,Review.,Technical,Report,NF1676L-19716,,Mar.,2016.,[24],D.,S.,Kershaw.,Differencing,of,the,diffusion,equation,in,Lagrangian,hydrodynamic,codes.,Journal,of,Com-,putational,Physics,,39(2):375–395,,Feb.,1981.,ISSN,0021-9991.,doi:,10.1016/0021-9991(81)90158-3.,[25],T.,Kolev.,CEED-MS36:,High-order,algorithmic,developments,and,optimizations,for,large-scale,GPU-,accelerated,simulations.,Technical,Report,LLNL-TR-821011,,Lawrence,Livermore,National,Lab.,(LLNL),,Livermore,,CA,(United,States),,Mar.,2021.,[26],T.,Manteuffel,,J.,Ruge,,and,B.,Southworth.,Nonsymmetric,algebraic,multigrid,based,on,local,approximate,ideal,restriction,(l,air).,SIAM,Journal,on,Scientific,Computing,,6(40):4105–4130,,2018.,[27],S.,A.,Orszag.,Spectral,methods,for,problems,in,complex,geometries.,Journal,of,Computational,Physics,,37,(1):70–92,,Aug.,1980.,ISSN,0021-9991.,doi:,10.1016/0021-9991(80)90005-4.,[28],W.,Pazner.,Efficient,low-order,refined,preconditioners,for,high-order,matrix-free,continuous,and,discon-,tinuous,galerkin,methods.,SIAM,Journal,on,Scientific,Computing,,42(5):A3055–A3083,,2022/03/17,2020.,[29],W.,Pazner,,T.,Kolev,,and,C.,Dohrmann.,Low-order,preconditioning,for,the,high-order,finite,element,de,rham,complex,,2022.,23,[30],J.,W.,Ruge,and,K.,Stüben.,Algebraic,multigrid.,In,Multigrid,Methods,,chapter,4,,pages,73–130.,SIAM,,1987.,doi:,10.1137/1.9781611971057.ch4.,[31],Y.,Saad,and,M.,H.,Schultz.,GMRES:,A,Generalized,Minimal,Residual,Algorithm,for,Solving,Nonsymmetric,Linear,Systems.,SIAM,J.,Sci.,Stat.,Comput.,,7(3):856–869,,1986.,doi:,10.1137/0907058.,[32],E.,Sahin,,A.,Lebedev,,M.,Abal,enkovs,,and,V.,Alexandrov.,Usability,of,Markov,Chain,Monte,Carlo,Precon-,In,2021,12th,Workshop,on,Latest,Advances,in,Scalable,Algorithms,for,ditioners,in,Practical,Problems.,Large-Scale,Systems,(ScalA),,pages,44–49,,Nov.,2021.,doi:,10.1109/ScalA54577.2021.00011.,[33],B.,S.,Southworth,,O.,Krzysik,,and,W.,Pazner.,Fast,solution,of,fully,implicit,runge–kutta,and,discontinuous,galerkin,in,time,for,numerical,pdes,,part,ii:,Nonlinearities,and,daes.,SIAM,Journal,on,Scientific,Comput-,ing,,44(2):A636–A663,,2022.,[34],B.,S.,Southworth,,O.,Krzysik,,W.,Pazner,,and,H.,D.,Sterck.,Fast,Solution,of,Fully,Implicit,Runge–Kutta,and,Discontinuous,Galerkin,in,Time,for,Numerical,PDEs,,Part,I:,the,Linear,Setting.,SIAM,Journal,on,Scientific,Computing,,44(1):A416–A443,,2022.,[35],N.,Spillane,,V.,Dolean,,P.,Hauret,,F.,Nataf,,C.,Pechstein,,and,R.,Scheichl.,Abstract,robust,coarse,spaces,for,systems,of,PDEs,via,generalized,eigenproblems,in,the,overlaps.,Numerische,Mathematik,,126(4):741–770,,2014.,ISSN,0945-3245.,[36],A.,J.,Wathen.,Preconditioning.,Acta,Numerica,,24:329–376,,May,2015.,ISSN,0962-4929,,1474-0508.,doi:,10.1017/S0962492915000021.,[37],J.,Xu,and,L.,Zikatanov.,Algebraic,multigrid,methods.,Acta,Numerica,,26:591–721,,May,2017.,ISSN,0962-,4929,,1474-0508.,doi:,10.1017/S0962492917000083.,A,Appendix:,HPDDM,In,this,appendix,,we,describe,some,of,the,technical,details,of,the,setup,of,the,HPDDM,preconditioner.,In,particular,,we,describe,the,cause,of,the,lack,of,scalability,observed,in,Figure,3,and,provide,a,suggestion,about,how,it,can,be,overcome.,The,current,implementation,of,HPDDM,associates,a,subdomain,on,the,finest,level,with,each,processor.,For,subsequent,levels,,the,user,has,to,specify,the,number,of,processors,handling,each,level.,Using,L,+,1,levels,,L,>,0,,each,processor,handling,a,subdomain,on,level,l,≤,L,performs,the,following:,1.,Factor,the,local,matrix,and,the,local,auxiliary,matrix,using,a,sparse,direct,solver.,2.,Compute,a,specified,number,of,generalised,eigenvectors,locally.,3.,Assemble,the,local,part,of,the,next,level,matrix.,4.,If,involved,in,handling,the,next,level,,receive,information,from,processors,in,the,same,aggregate.,Other-,wise,,send,data,to,the,representing,processor,on,the,next,level.,5.,If,l,<,L,,compute,the,local,auxiliary,matrices,required,in,the,next,level,6.,and,if,l,=,L,and,the,processor,is,involved,in,handling,the,following,level,,it,participates,in,factoring,the,coarse,matrix,at,level,L,+,1.,The,first,three,steps,are,entirely,local,and,so,they,scale,optimally.,The,third,step,involves,local,computation,and,moving,data,with,neighbouring,subdomains,and,should,scale,optimally,as,well.,However,,we,noticed,that—especially,when,the,number,of,DOFs,per,processor,is,small—this,very,step,is,dominating,the,computa-,tion,time,during,the,setup,phase.,The,main,reason,for,this,step,being,not,scalable,is,purely,implementation,related.,The,PETSc,interface,to,HPDDM,does,not,,for,the,time,being,,allow,data,gathering,to,be,performed,optimally,,namely,via,a,binary,tree,communication.,Furthermore,,the,computations,involved,in,this,step,,as,well,as,the,previous,step,,(P,T,AP,where,A,and,P,are,sparse),are,optimised,based,on,the,sparsity,pattern,only.,However,,in,practice,the,column,vectors,in,P,stem,from,local,generalised,eigenvalue,problems,involving,the,diagonal,blocks,in,A.,Therefore,,if,D,is,the,block,diagonal,matrix,obtained,from,A,and,A,=,D,+,B,,we,have,24,P,T,AP,=,P,T,DP,+,P,T,B,P,,,where,P,T,DP,is,a,diagonal,matrix,in,exact,arithmetic,and,P,T,B,P,is,a,hugely-sparse,matrix,containing,a,small,number,of,dense,blocks.,As,for,the,current,implementation,the,matrix,P,T,AP,stores,entries,based,on,the,sparsity,patterns,of,A,,since,the,sparsity,pattern,of,P,is,tightly,related,to,that,of,A.,The,upshot,of,this,is,that,a,huge,reduction,in,the,density,of,the,matrices,on,levels,>,1,can,be,achieved,by,considering,the,origin,of,the,column,vectors,in,P,.,This,would,lead,to:,•,a,reduction,in,the,computation,cost,in,steps,1,,2,,and,3,in,the,next,level,as,well,as,steps,5,and,6,on,the,same,level,,and,•,a,reduction,in,the,amount,of,data,sent/received,in,step,4.,We,believe,this,would,make,the,HPDDM,implementation,more,efficient,when,applied,to,large,numbers,of,processors.,25 :pdfembed:`src:_static/TN-03_TimeSteppingTechniquesPreconditioningHyperbolicAnisotropicEllipticProblems.pdf, height:1600, width:1100, align:middle`