TN-03_ImplicitFactorizationPreconditionersNeptuneProgramme ========================================================== .. meta:: :description: technical note :keywords: Implicit-factorization,preconditioners,for,NEPTUNE,Programme,Technical,Report,2047353-TN-03,Maksims,Abal¸enkovs∗,Vassil,Alexandrov∗,Anton,Lebedev∗,Emre,Sahin∗,Sue,Thorne∗∗,July,2021,1,Introduction,In,simulations,relating,to,plasma,physics,,and,more,generally,,the,dominate,component,in,terms,of,simulation,time,is,normally,the,time,to,solve,the,underlying,linear,systems,Ax,=,b.,In,this,report,,we,will,assume,that,A,∈,Rn×n,is,non-singular,,the,right-hand,side,b,∈,Rn,is,provided,and,we,wish,to,compute,an,(approximate),solution,x,∈,Rn.,These,systems,are,usually,solved,via,an,iterative,method,such,a,Krylov,subspace,method.,Typically,,the,nature,of,these,systems,mean,that,a,large,number,of,iterations,are,required,to,reach,the,desired,level,of,accuracy.,To,reduce,the,number,of,iterations,,we,aim,to,choose,a,preconditioner,P,∈,Rn×n,such,that,applying,the,iterative,method,to,the,equivalent,system,P,−1Ax,=,P,−1b,results,in,a,reduction,in,the,number,of,iterations,and,the,time,to,set-up,P,and,to,solve,with,P,during,each,iterations,is,such,that,there,is,an,overall,reduction,in,solution,time.,Note,,in,the,above,,we,have,described,left,preconditioning.,For,right,preconditioning,,one,is,solving,the,equivalent,system,It,is,also,possible,to,use,a,combination,of,left,and,right,preconditioners,together:,AP,−1y,=,b,,x,=,P,−1y.,1,AP,−1,P,−1,2,y,=,P,−1,1,b,,x,=,P,−1,2,y.,2,Software,for,Plasma,Physics,Modelling:,Overview,In,the,following,,we,describe,our,use,of,BOUT++,and,Nektar++,,which,have,been,identified,by,UKAEA,to,be,the,modelling,packages,of,interest,for,the,NEPTUNE,Project.,As,part,of,the,descriptions,,we,include,how,preconditioners,are,currently,incorporated,into,these,libraries.,∗The,authors,are,with,the,Hartree,Centre,,STFC,Daresbury,Laboratory,,Sci-Tech,Daresbury,,Keckwick,,Daresbury,,Warrington,,WA4,4AD,,UK.,Email,contact:,maksims.abalenkovs@stfc.ac.uk,∗∗Sue,Thorne,is,with,the,STFC,Rutherford,Appleton,Laboratory,,Harwell,Campus,,Didcot,,OX11,0QX,,UK.,Email,contact:,sue.thorne@stfc.ac.uk,1,2.1,BOUT++,BOUT++,[5],is,very,much,designed,for,matrix-free,calculations,,which,use,external,packages,PETSc,or,SUNDIALS,to,solve,the,more,complex,problems.,In,themselves,,these,external,packages,use,matrix-free,implementations,by,default.,This,means,that,custom,preconditioners,either,need,to,be,developed,in,a,matrix-free,,operator-based,manner,and,provided,via,BOUT++,or,,for,preconditioners,that,can,act,in,a,matrix-free,manner,,but,require,access,to,the,action,of,the,linear,system,being,solved,,it,would,need,implementing,within,the,under-lying,library.,As,part,of,this,project,,we,used,a,nonlinear,diffusion,test,case,that,simulates,the,effects,of,heat,conduction,in,a,plasma.,This,test,case,is,provided,as,part,of,BOUT++.,At,the,moment,,BOUT++,relies,on,the,PETSc,package,for,the,preconditioning,work,within,this,test,case.,There,are,multiple,ways,to,execute,the,heat,conduction,example.,In,the,simplest,possible,form,the,code,is,launched,with,./diffusion-nl,IMEX-BDF2,multistep,scheme,is,launched,with,./diffusion-nl,solver:type=imexbdf2,Preconditioning,is,enabled,with,the,following,flag,./diffusion-nl,solver:use_precon=true,Finally,,the,Jacobian,colouring,with,the,IMEX-BDF2,is,enabled,with,./diffusion-nl,solver:type=imexbdf2,solver:matrix_free=false,The,end,command,used,in,the,experiments,is,./diffusion-nl,solver:type=imexbdf2,solver:matrix_free=false,solver:use_precon=true,Extracting,and,saving,the,system,matrix,was,done,via,PETSc,routines.,In,order,to,save,the,matrix,imex-bdf2.cxx,source,code,file,was,modified.,The,precon,routine,was,supplemented,with,the,following,commands,(Figure,1):,PetscViewer,viewer;,PetscViewerASCIIOpen(MPI,COMM,WORLD,,"A.mtx",,&viewer);,PetscViewerPushFormat(viewer,,PETSC,VIEWER,ASCII,MATRIXMARKET);,1:,2:,3:,4:,MatView(Jmf,,viewer);,Figure,1:,C++,source,code,for,system,matrix,extraction,In,this,report,,we,also,consider,the,SD1D,test,case,[4],,which,uses,BOUT++,to,simulate,a,plasma,fluid,in,one,dimension,(along,the,magnetic,field),that,interacts,with,a,neutral,gas,fluid.,Unlike,the,nonlinear,diffusion,example,,this,test,case,provides,a,preconditioner,as,part,of,the,model,in,an,operator-based,manner.,2.2,Nektar++,Nektar++,[2,,6],is,designed,to,provide,discretisation,and,solution,of,partial,differential,equations,using,the,spectral/hp,element,method.,Nektar++,accepts,*.xml,or,similar,formats,as,inputs,,where,the,input,file,provides,finite-element,mesh,and,the,specifications,to,solve,the,specific,PDE,problem.,The,specification,of,the,mesh,format,within,Nektar++,is,hierarchically,defined,as:,1D,edges,as,connection,of,vertices,,2D,faces,as,bounding,edges,and,3D,elements,as,bounding,faces.,Nektar++’s,MultiRegions,library,derives,mapping,from,these,meshes,and,uses,these,mappings,for,different,Galerkin,projection,methods,to,assembly,a,global,linear,system.,Afterwards,,constructed,global,linear,system,may,be,solved,by,using,direct,Cholesky,factorisation,or,iterative,preconditioned,conjugate,gradient.,Nektar++,provides,range,selection,of,the,preconditioners,such,as,classical,Jacobi,preconditioner,,coarse-space,preconditioner,,block,preconditioner,and,low-energy,preconditioner.,It,is,also,providing,interface,to,use,PETSc,for,additional,solvers.,Nektar++,evaluates,L2,and,Linf,errors,against,the,provided,exact,solution.,2,Due,to,the,structure,of,the,MCMCMI,algorithm,,it,is,not,feasible,to,adapt,it,as,a,preconditioner,module,for,the,Nektar++,in,the,timeline,of,the,project,according,to,the,steps,described,in,[2].,Therefore,,Nektar++,is,edited,accordingly,to,extract,full,system,matrices,as,Matrix,Market,format,*.mtx,to,use,it,separately,in,the,MCMCMI,algorithm.,Nektar++,provides,Advection-Diffusion-Reaction,solver,to,solve,partial,differential,equations,of,the,form,(1),in,either,discontinuous,or,continuous,projections,of,the,solution,field,[6].,α,∂u,∂t,+,λu,+,v∇u,+,(cid:15)∇,·,(D∇u),=,f,(1),However,,it,is,not,possible,to,construct,linear,systems,for,the,discontinuous,Galerkin.,Hence,,application,cases,with,continuous,Galerkin,were,chosen,from,’Nektar++/Solvers/ADRSolver/Tests’,accordingly,to,the,prioritised,equations,as,described,in,[1].,2.3,Testing,Environment,Numerical,experiments,were,run,on,the,SKL,(Skylake),nodes,of,the,Scafell,Pike,system,which,consists,of,nodes,fitted,with,2x,XEON,gold,E5-6142,v5,processors,resulting,in,32,cores,per,node,and,,due,to,HyperThreading,,64,threads,per,node.,3,Numerical,Results,for,Sparse,Approximate,Inverse,Precondi-,tioners,MCMCMI,experiments,were,performed,on,four,system,matrices,extracted,from,the,BOUT++,software.,These,matrices,were,created,for,one,and,the,same,one-dimensional,“Non-linear,diffusion”,test,problem,called,diffusion-nl.,The,main,BOUT++,source,code,was,modified,to,enforce,PETSc,,powering,the,solution,of,diffusion-nl,problem,,to,output,system,matrices,into,the,Matrix,Market,*.mtx,format.,Matrix,orders,of,the,extracted,BOUT++,system,matrices,were,n,=,128,,512,,2048,,8192.,The,ever-,increasing,system,matrices,were,obtained,by,increasing,the,resolution,over,the,y,axis.,Unfortunately,,it,was,not,possible,to,extend,the,problem,even,further,since,PETSc,started,to,crash,for,ny,>,8192.,Figure,2:,Matrix,order,vs,GMRES,and,BiCGStab,iteration,steps,,BOUT++.,In,Figure,2,,we,compare,the,matrix,order,vs,GMRES,and,BiCGStab,iteration,steps,for,the,BOUT++,examples.,In,the,case,of,GMRES,method,and,small,matrices,(n,=,128,,512),,the,MCMCMI,preconditioner,P,decreases,the,number,of,iterations,required,for,the,linear,system,solution,by,18%,and,72%,respectively.,In,GMRES,method,applied,to,larger,matrix,orders,(n,=,2048,,8192),MCMCMI,preconditioner,shows,a,dramatic,improvement,of,94%.,On,the,other,hand,,using,the,BiCGStab,solver,,MCMCMI-based,preconditioner,provides,performance,comparable,with,the,reference,solution,for,small,matrices,(n,=,3,10,100,1000,10000,10000012851220488192IterationsMatrix,ordergmres(A)gmres(P·A)bicgstab(A)bicgstab(P·A),128,,512).,For,larger,system,matrices,(n,=,2048,,8192),MCMCMI,preconditioner,reduces,the,number,of,iterations,by,26%,and,27%,respectively.,Figure,3:,Matrix,order,vs,L1,norm,values,in,GMRES,and,BiCGStab,,BOUT++.,We,compare,the,L1,norm,values,for,our,approximate,solutions,from,GMRES,and,BiCGStab,for,the,BOUT++,test,problems,in,Figure,3.,The,highest,values,of,L1,norm,are,provided,by,the,preconditioned,linear,system,solve,in,the,GMRES,method.,L1,norm,values,for,non-preconditioned,solutions,in,GMRES,and,BiCGStab,are,comparable.,Finally,,the,smallest,values,of,L1,norm,are,obtained,by,the,preconditioned,solution,with,the,BiCGStab.,One,exception,is,the,L1,norm,values,for,the,smallest,test,matrix,(n,=,128).,Figure,4:,Matrix,order,vs,L2,norm,values,in,GMRES,and,BiCGStab,,BOUT++.,In,Figure,4,,we,compare,the,L2,norm,values,for,our,approximate,solutions,from,GMRES,and,BiCGStab,for,the,BOUT++,test,problems.,The,L2,norm,values,follow,the,same,tendency,as,the,L1,norms,described,above,(See,Fig.,3,for,details).,The,highest,values,are,produced,by,the,preconditioned,solution,with,the,GMRES,method,and,the,lowest—by,the,preconditioned,solution,with,the,BiCGStab,algorithm.,The,L∞,norm,values,are,compared,in,Figure,5.,L∞,norms,exhibit,behaviour,similar,to,the,L1,and,L2,norms.,Overall,,the,L∞,norms,are,the,smallest,amongst,all,norms.,With,rare,exceptions,the,preconditioned,system,solutions,resulted,in,the,highest,(GMRES),and,the,lowest,(BiCGStab),norm,values.,We,note,that,a,left,preconditioner,is,being,used,within,the,GMRES,method,and,,hence,,at,each,4,0.00000010.00000100.00001000.00010000.00100000.01000000.10000001.000000010.0000000100.000000012851220488192L1,normMatrix,ordergmres(A)gmres(P·A)bicgstab(A)bicgstab(P·A)0.000000010.000000100.000001000.000010000.000100000.001000000.010000000.100000001.0000000012851220488192L2,normMatrix,ordergmres(A)gmres(P·A)bicgstab(A)bicgstab(P·A),iteration,k,,the,preconditioned,residual,||P,−1,(b,−,Axk),||2,is,minimimised,,where,xk,is,a,member,of,the,Krylov,subspace,defined,by,span,(cid:110),P,−1r0,,P,−1AP,−1r0,,.,.,.,,,(cid:0)P,−1A(cid:1)k,P,−1r0,(cid:111),with,r0,=,b,−,Ax0.,Now,,||P,−1,(b,−,Axk),||2,||P,−1||2,≤,||b,−,Axk||2,≤,||P,||2||P,−1,(b,−,Axk),||2,and,,hence,,left,preconditioned,GMRES,is,not,minimising,the,L2,norm,of,the,residual,and,the,values,of,||P,||2,and,||P,−1||2,are,likely,to,be,such,that,whilst,||P,−1,(b,−,Axk),||2,can,be,small,,the,value,of,||b,−,Axk||2,can,be,large.,There,is,no,minimisation,property,for,the,iterations,of,BiCGStab,but,these,results,lead,to,the,question:,would,right,preconditioned,GMRES,lead,to,similar,results,to,BiCGStab?,In,the,future,,we,would,like,to,investigate,this,with,larger,problem,sizes.,Figure,5:,Matrix,order,vs,L∞,norm,values,in,GMRES,and,BiCGStab,,BOUT++.,In,Table,1,,we,provide,the,details,for,the,matrices,that,were,extracted,from,Nektar++.,We,compare,the,number,of,GMRES,and,BiCGSTAB,steps,in,Figure,6.,Since,the,iterations,are,higher,than,non-,preconditioned,GMRES,method,,the,preconditioner,failed,for,the,GMRES,method,for,all,except,the,Helmholtz,problems.,However,,the,preconditioner,is,successful,for,the,BICGSTAB,method.,Higher,L2,and,L∞,values,also,supports,the,failure,of,the,preconditioner,for,the,GMRES,method,,likewise,success,for,the,BICGSTAB,method,as,shown,in,the,Figures,7,and,8.,In,the,future,,we,would,like,to,investigate,the,use,of,the,MCMCMI,preconditioner,with,test,problems,that,arise,from,anisotropic,problems,to,see,if,the,performance,is,markedly,different.,5,0.000000010.000000100.000001000.000010000.000100000.001000000.010000000.1000000012851220488192L∞,normMatrix,ordergmres(A)gmres(P·A)bicgstab(A)bicgstab(P·A),Matrix,name,Label,n,Mode,—,Order,Int,timestep,H,SAD,Helmholtz,Steady,Adv-Diff,Unsteady,Adv-Diff,1,UAD1,Unsteady,Adv-Diff,2,UAD2,Unsteady,Adv-Diff,3,UAD3,Unsteady,Adv-Diff,4,UAD4,2D,modal,2D,modal,1312,2281,225,Order,1,225,Order,1,225,Order,2,225,Order,2,001,0001,001,0001,Table,1:,Nektar++,matrices,used,in,MCMCMI,preconditioner,experiments.,Figure,6:,Matrix,order,vs,GMRES,and,BiCGStab,iteration,steps,,Nektar++.,Figure,7:,Matrix,order,vs,L2,norm,values,in,GMRES,and,BiCGStab,,Nektar++.,6,100,1000,10000H,1312SAD,2281UAD1,225UAD2,225UAD3,225UAD4,225IterationsMatrix,name,ordergmres(A)gmres(P·A)bicgstab(A)bicgstab(P·A)0.000000010.000000100.000001000.000010000.000100000.001000000.010000000.100000001.0000000010.00000000H,1312SAD,2281UAD1,225UAD2,225UAD3,225UAD4,225L2,normMatrix,name,ordergmres(A)gmres(P·A)bicgstab(A)bicgstab(P·A),Figure,8:,Matrix,order,vs,L∞,norm,values,in,GMRES,and,BiCGStab,,Nektar++.,7,0.000000010.000000100.000001000.000010000.000100000.001000000.010000000.100000001.0000000010.00000000H,1312SAD,2281UAD1,225UAD2,225UAD3,225UAD4,225L∞,normMatrix,name,ordergmres(A)gmres(P·A)bicgstab(A)bicgstab(P·A),4,Numerical,results,for,operator-based,preconditioners,Here,,we,focus,on,operator-based,preconditioners.,An,understanding,of,the,underlying,mathematical,operators,and,the,numerical,properties,of,discretised,version,can,provide,great,insight,into,choice,of,preconditioner.,For,example,,a,standard,finite-element,discretisation,of,the,Laplacian,operator,will,,in,general,,have,condition,number,that,is,inversely,proportion,to,h2,for,2D,problems,and,h3,for,3D,,where,h,is,the,mesh,size.,Thus,,halving,h,will,increase,the,condition,number,of,factors,of,4,or,8,,respectively.,This,can,cause,a,dramatic,increase,in,iterations,if,no,preconditioner,,or,an,ineffective,preconditioner,,is,used.,Suppose,we,have,a,problem,that,couples,together,3,different,variables,,then,the,matrix,A,will,naturally,split,into,a,block,3,×,3,format:,,,A,=,A1,1,A1,2,A1,3,A2,1,A2,2,A2,3,A3,1,A3,2,A3,3,,,For,such,problems,,it,will,be,natural,to,use,a,block-diagonal,preconditioner,of,the,form:,P,=,,,P1,1,0,0,0,P2,2,0,0,0,P3,3,,,,,(2),(3),where,the,dimension,of,each,block,directly,tallies,with,the,dimension,of,the,associated,block,in,(2).,Alternatively,,P,could,be,block,upper,or,lower,triangular,,or,take,a,constraint,preconditioner,format:,,,P,=,P1,1,P1,2,A1,3,P2,1,P2,2,A2,3,A3,1,A3,2,A3,3,,,.,For,symmetric,A,,Wathen,[8],provides,a,good,overview,of,block,preconditioners,and,we,note,that,fac-,torizations,of,constraint,preconditioners,can,be,generated,implicitly,[3].,Constraint,preconditioners,have,been,extended,for,non-symmetric,problems,in,[7],and,,due,to,time,restrictions,,are,not,considered,for,the,test,case,considered,in,this,report.,Here,,we,consider,the,SD1D,test,case,[4],,which,uses,BOUT++,to,simulate,a,plasma,fluid,in,one,dimension,(along,the,magnetic,field),that,interacts,with,a,neutral,gas,fluid.,Unlike,the,nonlinear,diffusion,example,,this,test,case,provides,a,preconditioner,as,part,of,the,model,in,an,operator-based,manner.,SD1D,has,a,number,of,different,cases.,For,Case-03,,the,equations,for,the,plasma,density,n,,pressure,p,and,momentum,minV||i,are,evolved:,∂n,∂t,(cid:19),p,(cid:18),3,∂,∂t,2,(cid:0)minV||,∂,∂t,=,−∇,·,(cid:0)bV||n(cid:1),+,Sn,−,S,=,−∇,·,q,+,V||∂||p,+,Sp,−,E,−,R,(cid:1),=,−∇,·,(cid:0)minV||bV||,j||,=,0,(cid:1),−,∂||p,−,F,(4),(5),(6),Which,has,a,conserved,energy:,Ti,=,Te,=,1,2,p,en,q,=,5,2,pbV||,−,κ||e∂||Te,(cid:90),V,(cid:20),1,2,minV,2,||i,+,(cid:21),p,dV,3,2,The,heat,conduction,coefficient,κ||e,is,a,nonlinear,function,of,temperature,Te:,κ||e,=,κ0T,5/2,e,8,where,κ0,is,a,constant.,See,[4],for,further,details.,Operators,are:,∂||f,=,b,·,∇f,∇||f,=,∇,·,(bf,),(7),This,nonlinear,problem,is,solved,by,using,CVODE,from,the,SUNDIALS,library,[9],,which,uses,a,Newton,method.,At,the,heart,of,the,simulation,,a,large,number,of,systems,of,the,form,(2),are,solved,,where,A,=,I,−,γJ,and,J,are,Jacobian,matrices,for,a,computed,value,of,γ,:,for,clarity,,we,will,assume,that,J,has,the,same,block,structure,as,A,and,J1,j,are,derived,via,(4),,J2,j,are,derived,via,(5),,J3,j,are,derived,via,(6),,Jj,1,are,formed,by,taking,the,derivative,with,respect,to,n,,Jj,2,are,formed,by,taking,the,derivative,with,respect,to,p,and,Jj,3,are,formed,by,taking,the,derivative,with,respect,to,minV||.,We,note,that,J2,1,and,J2,2,contain,terms,that,including,∂2,||.,The,preconditioner,provided,within,SD1D,takes,the,following,block-diagonal,,operator,form:,P0,=,,,I,0,0,0,I,−,γ,2,3,∂2,||,0,,,.,0,0,I,(8),We,note,that,5,shows,that,∂2,diagonal,,operator-form,preconditioner:,||,is,applied,to,1,2,Te,and,not,p,,and,hence,,we,propose,the,following,block-,P1,=,,,I,0,0,0,(I,−,γ,1,3n,∂2,0,0,||)n,0,I,,,.,(9),We,note,that,application,of,the,preconditioner,P1,will,be,more,expensive,than,P0.,Finally,,for,Case-03,,we,also,try,a,block,lower,triangular,preconditioner,that,additionally,incorporates,the,∂||,from,(6):,P2,=,,,I,0,0,0,(I,−,γ,1,3n,∂2,−γ∂||,0,||)n,0,I,,,.,(10),Case-04,from,SD1D,additionally,couples,the,above,plasma,equations,to,a,similar,set,of,equations,for,the,neutral,gas,density,,pressure,,and,parallel,momentum.,A,fixed,particle,and,power,source,is,used,here,,and,a,20%,recycling,fraction.,Exchange,of,particles,,momentum,and,energy,between,neutrals,and,plasma,occurs,through,ionisation,,recombination,and,charge,exchange.,If,we,sub-divide,A,according,to,the,variables,that,are,being,evolved,,we,now,have,a,block,6,×,6,structure.,The,provided,preconditioner,P0,is,now,,P0,=,,,,,,,,I,0,0,0,0,0,0,I,−,γ,2,3,∂2,||,0,0,0,0,0,0,I,0,0,0,0,0,0,I,0,0,Using,the,properties,of,the,neutral,,we,instead,propose,P1,=,,,,,,,,,I,0,0,0,0,0,0,I,−,γ,2,3,∂2,||,0,0,0,0,0,0,I,0,0,0,0,0,0,I,0,0,0,0,0,0,,0,0,0,0,.,,,,,,,,,,,,,,,,,,0,0,0,0,I,(11),(12),I,−,γ,2,3,∂2,||,I,0,0,0,0,0,I,−,γDn,0,2,3,∂2,||,where,Dn,is,a,diffusion,term.,As,for,case-03,,we,also,propose,a,block,lower-triangular,preconditioner,that,incorporates,the,∂||,from,(6),and,the,∂||,from,the,corresponding,equation,for,neutral,momentum:,P2,=,,,,,,,,,I,0,0,0,0,0,0,3,∂2,||,I,−,γ,2,−γ∂||,0,0,0,0,0,0,I,0,0,0,0,I,0,0,0,9,0,0,0,0,2,3,∂2,||,I,−,γDn,−γ∂||,,,,,,,,,.,0,0,0,0,I,(13),Finally,,we,consider,a,preconditioner,that,is,identical,to,P2,but,the,∂||,relating,to,the,neutral,momentum,equation,is,dropped:,,P3,=,,,,,,,,I,0,0,0,0,0,0,3,∂2,||,I,−,γ,2,−γ∂||,0,0,0,0,0,I,0,0,0,0,0,0,I,0,0,0,0,0,0,I,−,γDn,0,2,3,∂2,||,0,0,0,0,I,,.,,,,,,,,(14),In,Figure,9,,we,compare,the,number,of,times,the,time,derivatives,(right-hand,sides),were,computed,for,Case-03,and,Case-04,for,the,different,preconditioners.,We,also,compare,what,happens,if,no,preconditioner,is,used,for,the,smaller,problems.,The,smaller,matrix,orders,are,from,the,default,problem,set-up,(i.e.,,200,mesh,points,for,each,variable).,For,Case-03,,we,also,compare,the,preconditioners,for,the,discretization,with,1000,mesh,points,per,variable,and,,for,Case-04,,400,mesh,points,per,variable.,If,we,consider,the,smaller,version,of,Case-03,,all,of,the,preconditioners,reduce,the,number,of,right-hand,side,evaluations,by,roughly,a,factor,of,36,with,P2,producing,the,lowest,number,of,evaluations;,for,the,larger,problem,,we,observe,that,P2,also,has,the,lowest,number,of,evaluations,with,an,18%,reduction,compared,to,P0.,In,Figure,10,,we,compare,the,wall,clock,time,to,run,the,simulations.,Just,one,MPI,process,was,used,with,one,OpenMP,thread,because,of,the,relatively,small,problem,sizes.,For,the,larger,version,of,Case-,03,,we,see,a,10%,improvement,,with,respect,to,wall-clock,time,,when,using,the,block,lower-triangular,preconditioner,compared,to,the,original,preconditioner.,These,savings,are,relatively,modest,because,the,density,n,remains,close,to,uniform,throughout,the,simulation.,For,Case-04,,we,observe,that,all,of,the,preconditioners,substantially,decrease,the,number,of,evaluations,and,the,wall,clock,time,compared,to,using,no,preconditioner.,For,both,problem,sizes,,compared,to,the,original,preconditioner,,preconditioners,P1,,P2,and,P3,are,reducing,the,number,of,evaluations,and,wall,clock,time,by,a,third.,Thus,,for,the,expert,user,,it,is,possible,to,incorporate,more,sophisticated,operator-,based,preconditioners,within,BOUT++.,Figure,9:,Total,number,of,right-hand,side,evaluations,performed,during,the,whole,simulation,for,Case-03,and,Case-04,with,different,preconditioners.,For,Case-03,,results,for,the,default,discretization,with,200,mesh,points,per,variable,(matrix,order,600),and,a,larger,problem,size,with,1000,mesh,points,per,variable,(matrix,order,3000),are,provided.,For,Case-04,,results,for,the,default,discretization,with,200,mesh,points,per,variable,(matrix,order,600),and,a,larger,problem,size,with,1000,mesh,points,per,variable,(matrix,order,3000),are,provided.,10,100001000001x1061x107600,(3)3000,(3)1200,(4)2400,(4)Total,RHS,evaluationsMatrix,order,(case)AP0·AP1·AP2·AP3·A,Figure,10:,Total,wall,clock,time,(seconds),for,the,whole,simulation,for,Case-03,and,Case-04,with,different,preconditioners.,For,Case-03,,results,for,the,default,discretization,with,200,mesh,points,per,variable,(matrix,order,600),and,a,larger,problem,size,with,1000,mesh,points,per,variable,(matrix,order,3000),are,provided.,For,Case-04,,results,for,the,default,discretization,with,200,mesh,points,per,variable,(matrix,order,600),and,a,larger,problem,size,with,1000,mesh,points,per,variable,(matrix,order,3000),are,provided.,5,Proposed,roadmap,for,including,new,preconditioners,within,BOUT++,and,Nektar++,Prior,to,including,custom,new,preconditioners,into,BOUT++,and,Nektar++,[2],it,would,be,the,best,to,have,the,exact,testing,scenarios,of,interest,to,UKAEA,ready.,Once,the,testing,scenarios,are,available,and,working,at,scale,close,to,the,real,problems,UKAEA,want,to,solve,,it,would,be,the,right,time,to,investigate,preconditioner,deployment.,Ideally,the,new,preconditioners,need,to,be,reformulated,in,a,matrix-free,manner.,Once,these,formulations,are,available,they,could,be,integrated,into,PETSc,,SUNDIALS,and,other,solver,packages,powering,BOUT++,and,Nektar++.,This,way,it,would,bring,more,benefit,to,the,user,community.,The,user,base,of,PETSc,and,SUNDIALS,is,likely,to,be,much,larger,than,that,of,BOUT++,and,Nektar++.,This,would,also,ensure,that,only,minimal,changes,would,be,required,to,BOUT++,and,Nektar++,in,order,to,leverage,computational,advantages,of,new,preconditioners.,In,summary,the,following,steps,are,required,for,a,successful,deployment,of,new,preconditioners,into,BOUT++,and,Nektar++:,1.,Design,testing,scenarios,in,native,BOUT++,and,Nektar++.,These,scenarios,should,reflect,real-,world,problems,UKAEA,solves,at,the,moment.,Pay,attention,to,scale,at,which,simulations,should,be,performed.,2.,Identify,(matrix-free),methods,and,underlying,solver,packages,(PVODE,,CVODE,,PETSc,,SUN-,DIALS),that,BOUT++,and,Nektar++,employ,to,solve,those,problems.,3.,Reformulate,MCMCMI-based,preconditioner,in,the,operator,(matrix-free),form.,4.,Implement,MCMCMI,preconditioner,in,operator,form,and,test,it,thoroughly.,5.,Integrate,operator-based,MCMCMI,preconditioner,into,solver,packages,identified,in,Step,2.,6.,Solve,testing,scenarios,identified,in,Step,1,using,standard,(native),BOUT++,and,Nektar++,meth-,ods,as,well,as,new,MCMCMI,preconditioners.,7.,Compare,computational,performance,of,native,vs,MCMCMI-based,preconditioners.,11,10100100010000600,(3)3000,(3)1200,(4)2400,(4)Wall,clock,time,,sMatrix,order,(case)AP0·AP1·AP2·AP3·A,6,Conclusion,Including,MCMCMI-based,preconditioners,into,either,BOUT++,or,Nektar++,simulation,software,is,difficult,at,the,moment.,First,,the,MCMCMI,code,needs,to,be,reformulated,in,the,matrix-free,man-,ner.,Then,it,can,be,integrated,into,solver,packages,(PETSc,,SUNDIALS),powering,solving,abilities,of,BOUT++,and,Nektar++.,Reformulation,and,implementation,of,MCMCMI,in,the,operator,form,will,take,3–6,months.,Integration,of,operator-based,MCMCMI,into,PETSc,and,SUNDIALS,will,take,another,3–6,months.,Therefore,,to,obtain,preliminary,results,of,MCMCMI,performance,over,the,test,problems,the,STFC,team,decided,to,extract,the,system,matrices,from,the,relevant,BOUT++,and,Nektar++,testing,scenarios.,In,the,process,the,team,discovered,the,(i),it,was,not,possible,to,extract,the,matrices,for,some,scenarios,due,to,their,inherent,matrix-free,nature,,(ii),some,testing,scenarios,of,interest,to,UKAEA,are,not,available,yet,in,either,BOUT++,or,Nektar++.,These,need,to,be,designed,and,developed,further.,Acknowledgements,The,team,would,like,to,thank,Dr.,Benjamin,Dudson,from,the,University,of,York,and,Dr.,Chris,Cantwell,from,the,Imperial,College,London,for,their,time,,help,and,guidance,in,technical,aspects,of,BOUT++,and,Nektar++,functionality.,References,[1],V.,Alexandrov,,A.,Lebedev,,E.,Sahin,,and,S.,Thorne.,Linear,systems,of,equations,and,preconditioners,relating,to,the,NEPTUNE,Programme:,a,brief,overview.,Technical,Report,2047353-TN-02,,UKAEA,,2021.,[2],C.,D.,Cantwell,,D.,Moxey,,A.,Comerford,,A.,Bolis,,G.,Rocco,,G.,Mengaldo,,D.,D.,Grazia,,S.,Yakovlev,,J.-E.,Lombard,,D.,Ekelschot,,B.,Jordi,,H.,Xu,,Y.,Mohamied,,C.,Eskilsson,,B.,Nelson,,P.,Vos,,C.,Biotto,,R.,M.,Kirby,,and,S.,J.,Sherwin.,Nektar++:,An,open-source,spectral/hp,element,framework.,Computer,physics,communications,,192:205–219,,2015.,[3],H.,S.,Dollar,,N.,I.,Gould,,W.,H.,Schilders,,and,A.,J.,Wathen.,Implicit-factorization,preconditioning,and,iterative,solvers,for,regularized,saddle-point,systems.,SIAM,Journal,on,Matrix,Analysis,and,Applications,,28(1):170–189,,2006.,[4],B.,Dudson.,SD1D.,online,repository,,2016.,URL,https://github.com/boutproject/SD1D.,[5],B.,Dudson,,P.,Hill,,and,J.,Parker.,BOUT++.,online,repository,,2020.,URL,http://boutproject.,github.io.,[6],S.,Sherwin,,M.,Kirby,,C.,Cantwell,,and,D.,Moxey.,Nektar++.,online,,2021.,URL,https://www.,nektar.info.,[7],S.,Thorne.,Implicit-factorization,preconditioners,for,non-symmetric,problems.,Technical,Report,2047353-TN-04,,UKAEA,,2021.,[8],A.,Wathen.,Preconditioning.,Acta,Numerica,,24:329,–,376,,2015.,[9],C.,S.,Woodward,,D.,R.,Reynolds,,A.,C.,Hindmarsh,,D.,J.,Gardner,,and,C.,J.,Balos.,SUNDIALS:,SUite,of,Nonlinear,and,DIfferential/ALgebraic,Equation,Solvers.,online,,2021.,URL,https://computing.,llnl.gov/projects/sundials.,12 :pdfembed:`src:_static/TN-03_ImplicitFactorizationPreconditionersNeptuneProgramme.pdf, height:1600, width:1100, align:middle`