Load balancing 
Load balancing in PPM comprises two main components: domain decomposition and assignment of subdomains onto processors. While the former has to ensure sufficient granularity and partitioning of the computational cost, the latter has to ensure even distribution of computational load among processors, accounting for possible differences in processor speeds. Domain decompositionIn order to account for a large spectrum of possible applications, PPM provides a number of different adaptive domain decomposition techniques for particles, meshes, and volumes, the latter defining geometric subdomains with neither meshes nor particles present. These decompositions currently include:
Recursive orthogonal bisection is based on an adaptive binary tree, where subdivisions are allowed in all spatial directions. Special decompositionsThe userdefined decomposition allows the client program to explicitly specify the subdomains. The null decomposition does not perform any domain decomposition. It creates only one subdomain which is the computational domain itself. This trivial decomposition is used to evenly distribute the particles among processors, irrespective of their spatial location. The resulting special topology is called the ring topology and the subdomain is assigned to every processor. Processor assignmentThe computational cost for each subdomain (e.g. given by the number of particles, number of mesh points, or the true computational cost) is known from the domain decomposition step. The individual processor speeds are measured using PPM routines. This involves performing a molecular dynamics test simulation of a small LennardJones system using an increasing number of particles until all processors yield sufficient timing statistics. Such speed measurements are important in inhomogeneous machine clusters or under concurrent use of shared processors and typically take less than 1 second of wallclock time. Using the computational costs of the subdomains and the relative processor speeds, PPM provides several methods of assigning the subdomains to the processors. The PPMinternal method assigns contiguous blocks of subdomains to processors until the accumulated cost of a processor is greater or equal to the theoretical average cost under uniform load distribution. The average is weighted with the relative processor speeds. In addition, four different Metisbased and a userdefined assignment are available. In conjunction with a userdefined domain decomposition, the userdefined assignment scheme allows the client program to enforce a specific processor affiliation for each subdomain. For a Metisassignment, the subdomain partitioning problem is first translated to the equivalent graph partitioning problem. Two different conversions are supported by \metis: in the primal scheme each subdomain is represented by a node in the graph and the neighborhood relations by the edges of the graph; the dual scheme represents subdomains by graph edges. Graph partitioning is then performed such as to minimize either edge cut or communication volume. The relative processor speeds and computational costs of the subdomains are accounted for by means of weights assigned to the nodes and edges of the graph.
