Particle Loop#

Introduction#

In NESO-Particles (NP) a “Particle Loop” is a loop over all particles in a collection of particles. For each particle in the iteration set a user provided kernel is executed. This user provided kernel may access the data stored on each particle (ParticleDats) and access global data. For each data structure accessed by the kernel the user must provide an access descriptor. This access descriptor describes exactly how the data will be accessed, e.g. read-only. The provided kernel must be written such that the result of execution of the loop is independent of the execution order of the loop, i.e. parallel and unsequenced in C++ terminology. The particle loop abstraction follows the particle loop abstraction in [SAUNDERS2018].

Advection Example#

In the code listings below we present and describe an example ParticleLoop. This example loop assumes that a ParticleGroup has been created with “P” and “V” particle properties which are real valued and have at least two components.

Listing 1 Example of a ParticleLoop which performs a simple advection operation.#
inline void advection_example(
    // Input ParticleGroup - we will loop over all particles in this
    // ParticleGroup.
    ParticleGroupSharedPtr particle_group
) {
  
  // These constants are captured by value into the kernel lambda.
  const int ndim = 2;
  const REAL dt = 0.001;
  
  auto loop = particle_loop(
    "advection_example", // Optional name for the loop - for profiling.

    particle_group,      // Iteration set is defined as all particles in this
                         // ParticleGroup.

    // This lambda defines the kernel to be executed for
    // all particles in the iteration set.
    // The [=] captures the ndim and dt variables by value. A [&] would capture
    // these variables by reference.
    [=](auto P, auto V){ // These parameters have a one-to-one correspondence
                         // with the loop arguments that follow the kernel in
                         // the call to particle_loop.
      
      // Loop over the P particle property and update the values with values
      // read from the V property.
      for(int dimx=0 ; dimx<ndim ; dimx++){
        P[dimx] += dt * V[dimx];
      }
    },
    // The remaining arguments passed to the particle loop are the link between
    // the kernel parameters and the data structures which the loop accesses.
    // Each argument is passed as a combination of an access descriptor and a
    // data structure. 
    
    // In this example the data structure is a ParticleDat and it is referenced
    // by the symbol for the ParticleDat in the ParticleGroup. Here we indicate
    // that the "P" ParticleDat is accessed in a write mode.
    Access::write(Sym<REAL>("P")),
    // Here we pass the V ParticleDat to the particle loop and indicate that
    // these particle properties are accessed in a read-only mode in the
    // particle loop.
    Access::read(Sym<REAL>("V"))
  );
  
  // Execute the ParticleLoop. A ParticleLoop can be executed multiple times
  // without reconstruction.
  loop->execute();

  return; 
}

The particle loop can be executed asynchronously via calls to “submit” and “wait”.

Listing 2 Duplicate of the advection ParticleLoop example with the comments removed and asynchronous execution.#
inline void advection_example_no_comments(
    ParticleGroupSharedPtr particle_group
) {
  
  const int ndim = 2;
  const REAL dt = 0.001;
  
  auto loop = particle_loop(
    "advection_example",
    particle_group,
    [=](auto P, auto V){
      for(int dimx=0 ; dimx<ndim ; dimx++){
        // .at is an alternative access method
        P.at(dimx) += dt * V.at(dimx);
      }
    },
    Access::write(Sym<REAL>("P")),
    Access::read(Sym<REAL>("V"))
  );

  // Launch the particle loop.
  loop->submit();

  // Wait for execution of the particle loop to complete.
  loop->wait();
  return; 
}
Table 1 ParticleDat<T> Access#

Access Descriptors

Kernel Types

Notes

Read

Access::ParticleDat::Read<T>

Components accessed for the particle via .at or subscript which returns a const reference.

Write

Access::ParticleDat::Write<T>

Components accessed for the particle via .at or subscript which returns a modifiable reference.

Additional Data Structures#

In the advection example we described a particle loop that only accessed particle data. In addition to particle data particle loops may access additional data structures which are not tied to a particular particle.

The “LocalArray” data structure is an array type which is local to the MPI rank on which it is defined, i.e. no MPI communication occurs. The local array type can be considered similar to the std::vector type. The “GlobalArray” type is an array type where a copy of the array is stored on each MPI rank and access to the array is collective over the MPI communicator on which the global array is defined. The “CellDatConst” type is a matrix type defined with fixed size (the “Const” part of the name) and data type for each cell in a mesh. A matrix with a fixed number of rows and a variable number of columns per cell can be created per cell with the “CellDat” data structure. Due to the SYCL language restrictions the number of kernel arguments must be fixed at compilation time, the “SymVector” data structure applies indirection to allow a “vector” of particle properties to be passed at runtime. The “ParticleLoopIndex” provides access to indexing information for the particle within the particle loop kernel.

These additional data structures must be passed to a particle loop with an access mode which is commutative. In practice only commutative access modes are defined for these data structures and attempting to pass them to a particle loop with an inappropriate access descriptor will result in a compile time error.

LocalArray#

The local array type is local to each MPI rank. No communication between MPI ranks is performed by the particle loop. The local array can be accessed in particle loops with “read” and “add” access modes.

Listing 3 Particle loop example where a local array is incremented by each particle and another local array is read by each particle.#
inline void local_array_example(
    ParticleGroupSharedPtr particle_group
) {
  
  // Create a new LocalArray on the same SYCLTarget as the particle group.
  auto local_array_add = std::make_shared<LocalArray<REAL>>(
    particle_group->sycl_target, // Compute target to use.
    2,                           // Number of elements in the array.
    0                            // Initial value for elements.
  );
  
  // Create a local array using initial values from a std::vector.
  std::vector<REAL> d0 = {1.0, 2.0, 3.0};
  auto local_array_read = std::make_shared<LocalArray<REAL>>(
    particle_group->sycl_target, // Compute device.
    d0                           // Initial values and size definition.
  );
  
  auto loop = particle_loop(
    "local_array_example",
    particle_group,
    // LA_ADD is the parameter for the LocalArray with "add" access and LA_READ
    // is the parameter for the LocalArray with read-only access.
    [=](auto ID, auto V, auto LA_ADD, auto LA_READ){
      
      // Increment the first component of LA_ADD by 1.
      LA_ADD.fetch_add(0, 1);
      // Increment the second component of LA_ADD with the entry in ID[0].
      LA_ADD.fetch_add(1, ID[0]);
      
      // Read from LA_READ and assign the values to the V particle component.
      V[0] = LA_READ.at(0);
      V[1] = LA_READ.at(1);
      V[2] = LA_READ.at(2);
    },
    // Particle property access descriptors.
    Access::read(Sym<INT>("ID")),
    Access::write(Sym<REAL>("V")),
    // LocalArray access descriptors.
    Access::add(local_array_add),
    Access::read(local_array_read)
  );

  // Execute the loop.
  loop->execute();
  
  // Get the contents of the local array in a std::vector.
  auto vec0 = local_array_add->get();

  return; 
}
Table 2 LocalArray<T> Access#

Access Descriptors

Kernel Types

Notes

Read

Access::LocalArray::Read<T>

Components accessed for each array element via .at or subscript which returns a const reference.

Write

Access::LocalArray::Write<T>

Components accessed for each array element via .at or subscript which returns a modifiable reference. It is the responsiblity of the user to avoid write conflicts and race conditions between loop iterations.

Add

Access::LocalArray::Add<T>

fetch_add(index, value) atomically increments the element referenced by the index with the passed value. Returns the previous value stored at the index. Users may assume that the memory region accessed is the same for all invocations of the kernel. Hence adding 1 for each particle gives a total ordering for each participating calls.

GlobalArray#

Unlike the local array, a global array is intended to have identical values across MPI ranks. The global array can be accessed in “read” and “add” access modes. When accessed with “add” access modes the particle loop must be executed collectively across all MPI ranks. On completion of the loop the local entries of each global array are globally combined (Allreduce).

Listing 4 Particle loop example where a global array is incremented by each particle.#
inline void global_array_example(
    ParticleGroupSharedPtr particle_group
) {
  
  // Create a new GlobalArray on the same SYCLTarget as the particle group.
  auto global_array = std::make_shared<GlobalArray<REAL>>(
    particle_group->sycl_target, // Compute target to use.
    1,                           // Number of elements in the array.
    0                            // Initial value for elements.
  );
  
  auto loop = particle_loop(
    "global_array_example",
    particle_group,
    [=](auto V, auto GA){
      // Kinetic energy of this particle.
      const REAL kinetic_energy = 
        0.5 * (V[0] * V[0] + V[1] * V[1]);
      // Increment the first component of the global array with the
      // contribution from this particle.
      GA.add(0, kinetic_energy);
    },
    // Particle property access descriptor.
    Access::read(Sym<REAL>("V")),
    // GlobalArray access descriptor.
    Access::add(global_array)
  );
  
  // Execute the loop. This must be called collectively on the MPI communicator
  // of the SYCLTarget as the add operation is collective.
  loop->execute();

  // Get the contents of the global array in a std::vector. The first element
  // would be the kinetic energy of all particles in the ParticleGroup.
  auto vec0 = global_array->get();

  return; 
}
Table 3 GlobalArray<T> Access#

Access Descriptors

Kernel Types

Notes

Read

Access::GlobalArray::Read<T>

Components accessed for each array element via .at or subscript which returns a const reference.

Add

Access::GlobalArray::Add<T>

add(index, value) atomically increments the element referenced by the index with the passed value. On loop execution completion values are all-reduced across all MPI ranks.

CellDatConst and CellDat#

The CellDatConst data structure stores a constant sized matrix per mesh cell. When accessed from a particle loop both the read and add access descriptors expose the matrix which corresponds to the cell in which the particle resides.

Listing 5 Particle loop example where a CellDatConst is accessed by the ParticleLoop. CellDat is accessed in a ParticleLoop in an identical manner.#
inline void cell_dat_const_example(
    ParticleGroupSharedPtr particle_group
) {
  
  // Get the number of cells on this MPI rank.
  const int cell_count = particle_group->domain->mesh->get_cell_count();

  // Create a 3x1 matrix for cell_count cells.
  auto g1 = std::make_shared<CellDatConst<int>>(
      particle_group->sycl_target, cell_count, 3, 1);
  
  // For each cell get the current matrix and zero the values then write back
  // to the data structure.
  for (int cx = 0; cx < cell_count; cx++) {
    auto cell_data = g1->get_cell(cx);
    cell_data->at(0, 0) = 0.0;
    cell_data->at(1, 0) = 0.0;
    cell_data->at(2, 0) = 0.0;
    g1->set_cell(cx, cell_data);
  }

  // Alternatively all entries in all cells of the CellDatConst may be filled
  // with a value.
  g1->fill(0.0);
  
  auto loop = particle_loop(
    "cell_dat_const_example",
    particle_group,
    [=](auto V, auto GA){
      // Increment the matrix in each cell with the velocities of particles in
      // that cell.
      GA.fetch_add(0, 0, V[0]);
      GA.fetch_add(1, 0, V[1]);
      GA.fetch_add(2, 0, V[2]);
    },
    // Particle property access descriptor.
    Access::read(Sym<REAL>("V")),
    // CellDatConst access descriptor.
    Access::add(g1)
  );
  
  // Execute the loop.
  loop->execute();

  // After loop execution the 3x1 matrix in each cell will contain the sum of
  // the particle velocities in each cell.
  for (int cx = 0; cx < cell_count; cx++) {
    auto cell_data = g1->get_cell(cx);
    // use cell data
  }

  return; 
}
Table 4 CellDatConst<T> Access#

Access Descriptors

Kernel Types

Notes

Read

Access::CellDatConst::Read<T>

Components accessed for each array element via .at(row, col) or subscript (linearised index) which returns a const reference.

Add

Access::CellDatConst::Add<T>

fetch_add(row, col, value) atomically increments the element referenced by the index with the passed value. Returns the previous value stored at the index. Users may assume that the memory region accessed is the same for all invocations of the kernel. Hence adding 1 for each particle gives a total ordering for each participating calls.

Min

Access::CellDatConst::Min<T>

fetch_min(row, col, value) atomically computes the minimum of element referenced by the index and the passed value. Returns the previous value stored at the index.

Max

Access::CellDatConst::Max<T>

fetch_max(row, col, value) atomically computes the maximum of element referenced by the index and the passed value. Returns the previous value stored at the index.

Table 5 CellDat<T> Access#

Access Descriptors

Kernel Types

Notes

Read

Access::CellDat::Read<T>

Components accessed for each array element via .at(row, col) which returns a const reference.

Add

Access::CellDat::Add<T>

fetch_add(row, col, value) atomically increments the element referenced by the index with the passed value. Returns the previous value stored at the index. Users may assume that the memory region accessed is the same for all invocations of the kernel. Hence adding 1 for each particle gives a total ordering for each participating calls.

Write

Access::CellDat::Write<T>

Write using a modifiable reference provided by .at(row, col). It is the users responsibility that no write conflicts or race conditions occur.

SymVector#

SymVector enables passing a set of ParticleDats to the ParticleGroup where the number of ParticleDats is only known at runtime. The indexing of the data structure in the kernel is via the positional index of the ParticleDat relative to the construction of the SymVector.

Listing 6 Particle loop example where a SymVector is accessed by the ParticleLoop.#
inline void sym_vector_example(
    ParticleGroupSharedPtr particle_group
) {
  
  // These constants are captured by value into the kernel lambda.
  const int ndim = 2;
  const REAL dt = 0.001;
  
  auto loop = particle_loop(
    "sym_vector_example",
    particle_group,
    [=](auto dats_vector){
      for(int dimx=0 ; dimx<ndim ; dimx++){
        // P has index 0 in dats_vector as it is first in the sym_vector.
        // V has index 1 in dats_vector as it is second.
        dats_vector.at(0, dimx) += dt * dats_vector.at(1, dimx);
      }     
    },
    // We state that all ParticleDats in the SymVector are write access.
    Access::write(
      // Helper function to create a SymVector.
      sym_vector<REAL>(
        particle_group,
        // This argument may also be a std::vector of Syms.
        {Sym<REAL>("P"), Sym<REAL>("V")}
      )
    )
  );

  // Execute the loop.
  loop->execute();
  return; 
}
Table 6 SymVector<T> Access#

Access Descriptors

Kernel Types

Notes

Read

Access::SymVector::Read<T>

Components accessed for each array element via .at or subscript which returns a const reference.

Write

Access::SymVector::Write<T>

Components accessed for each array element via .at or subscript which returns a reference.

ParticleLoopIndex#

Some data structures are indexed by the cell and layer of the particle. The ParticleLoopIndex is a data structure that can be read in a ParticleLoop to provide this information. This data structure is read-only.

Listing 7 Particle loop example where a ParticleLoopIndex is accessed by the ParticleLoop.#
inline void advection_example_loop_index(
    ParticleGroupSharedPtr particle_group
) {
  
  auto loop = particle_loop(
    "loop_index_example",
    particle_group,
    [=](auto index){
      // Dummy output variable we store the indices in for this example.
      [[maybe_unused]] INT tmp;
      // The cell containing the particle.
      tmp = index.cell; 
      // The row (layer) containing the particle.
      tmp = index.layer; 
      // The linear index of the particle on the calling MPI rank. 
      // This index is in [0, A->get_npart_local()).
      tmp = index.get_local_linear_index(); 
      // The linear index of the particle in the current ParticleLoop. 
      // This index is in [0, <size of ParticleLoop iteration set>).
      tmp = index.get_loop_linear_index(); 
    },
    // Note the extra {} that creates an instance of the type.
    Access::read(ParticleLoopIndex{})
  );

  loop->execute();
  return; 
}
Table 7 ParticleLoopIndex Access#

Access Descriptors

Kernel Types

Notes

Read

Access::ParticleLoopIndex::Read

Indices accessed via .cell, .layer, .get_local_linear_index and .get_loop_linear_index.

[SAUNDERS2018]

A domain specific language for performance portable molecular dynamics algorithms. CPC , arXiv.