This module contains all routines needed for dynamic load balancing.

Currently only the SOR (stop at rise) method is provided as the heuristic to estimate the time when the domain needs to be redecomposed.

Subroutines contained in this interface:

name | description |
---|---|

This routine can be used to estimate the relative speeds | |

This routine can be used to estimate the relative speeds |

Subroutines contained in this interface:

name | description |
---|---|

This routine calculates the computational cost of each subdomain | |

This routine calculates the computational cost of each subdomain |

Subroutines contained in this interface:

name | description |
---|---|

Inquires about the load balance status and returns advise whether | |

Inquires about the load balance status and returns advise whether |

Subroutines contained in this interface:

name | description |
---|---|

Sets/updates the internal estimate for the computational cost | |

Sets/updates the internal estimate for the computational cost |

Subroutines contained in this interface:

name | description |
---|---|

This routine can be used by the user to set the relative speeds of | |

This routine can be used by the user to set the relative speeds of |

This routine can be used to estimate the relative speeds of the processors. It is also being used in subs2proc for load balancing. The estimation is done by computing a number of Lennard-Jones PP interactions.

name | type | dimension | intent | optional | description |
---|---|---|---|---|---|

real array | (:) | Relative speeds of all processors from 0 to ppm_nproc-1. The numbers | |||

integer | (OUT) | return status, 0 on success | |||

real | (IN) |
| OPTIONAL Minimum time for which benchmark is required to run on each | ||

real | (IN) |
| OPTIONAL Benchmark stops as soon as slowest processor has run for | ||

integer | (IN) |
| Initial number of LJ-particles to use. OPTIONAL. Default is |

real array, `(:)`

, no intent declared

Relative speeds of all processors from 0 to ppm_nproc-1. The numbers do sum up to 1.

integer, , (OUT)

return status, 0 on success

real, , (IN)

OPTIONAL Minimum time for which benchmark is required to run on each processor. Default is 0.1 seconds.

real, , (IN)

OPTIONAL Benchmark stops as soon as slowest processor has run for this time (provided that mintime requirement is met). Default is 5 seconds

integer, , (IN)

Initial number of LJ-particles to use. OPTIONAL. Default is 1000. Set this to a smaller value when running on a slow processor. Increase is done automatically to meet the time requirements.

ppm_module_data, ppm_module_error, ppm_module_alloc, ppm_module_substop, ppm_module_util_time, ppm_module_write, ppm_module_substart

This routine can be used to estimate the relative speeds of the processors. It is also being used in subs2proc for load balancing. The estimation is done by computing a number of Lennard-Jones PP interactions.

name | type | dimension | intent | optional | description |
---|---|---|---|---|---|

real array | (:) | Relative speeds of all processors from 0 to ppm_nproc-1. The numbers | |||

integer | (OUT) | return status, 0 on success | |||

real | (IN) |
| OPTIONAL Minimum time for which benchmark is required to run on each | ||

real | (IN) |
| OPTIONAL Benchmark stops as soon as slowest processor has run for | ||

integer | (IN) |
| Initial number of LJ-particles to use. OPTIONAL. Default is |

real array, `(:)`

, no intent declared

Relative speeds of all processors from 0 to ppm_nproc-1. The numbers do sum up to 1.

integer, , (OUT)

return status, 0 on success

real, , (IN)

OPTIONAL Minimum time for which benchmark is required to run on each processor. Default is 0.1 seconds.

real, , (IN)

OPTIONAL Benchmark stops as soon as slowest processor has run for this time (provided that mintime requirement is met). Default is 5 seconds

integer, , (IN)

Initial number of LJ-particles to use. OPTIONAL. Default is 1000. Set this to a smaller value when running on a slow processor. Increase is done automatically to meet the time requirements.

ppm_module_data, ppm_module_error, ppm_module_alloc, ppm_module_substop, ppm_module_util_time, ppm_module_write, ppm_module_substart

Inquires about the load balance status and returns advise whether redecomposing the problem is recommended (based on some decision heuristic the user chooses).

Tip | |
---|---|

The user should time (using ppm_time) computations for the topology/topologies considered for dyanmic remaping. The elapsed time is given to this routine. |

Note | |
---|---|

This estimate is currently a scalar, so only one topology/set of
topologies can be monitored. Introducing a topology set ID and making
the internal estimates a vector, this can later be extended to
multiple topology sets if needed. The topology set
ID for which to evaluate the load balance will then be an additional
argument to this routine. This routine does two global MPI
communication operations ( |

name | type | dimension | intent | optional | description |
---|---|---|---|---|---|

real | (IN) | Elapsed time (as measured by | |||

integer | (IN) | Number of time steps since last redecomposition. (>0) | |||

integer | (IN) | Decision heuristic for redecomposition advise. One of: | |||

logical | (IN) |
| |||

real | (OUT) | Load imbalance defined as the ratio of computation time of the | |||

logical | (OUT) |
| |||

integer | (OUT) | Estimated (linear extrapolation) number of time steps to go until next | |||

integer | (OUT) | Returns status, 0 upon success |

real, , (IN)

Elapsed time (as measured by `ppm_time`

) for all computation in one
time step on the local processor.

integer, , (IN)

Number of time steps since last redecomposition. (>0) If this routine is not called every time step, linear interpolation of the load imbalance will be used to reconstruct missing data points.

integer, , (IN)

Decision heuristic for redecomposition advise. One of:

- ppm_param_loadbal_sar (Stop-at-Rise heuristic)

logical, , (IN)

`TRUE`

to flush internal statistics (e.g. the first time this routine
is called after actually doing a redecomposition of the problem),
`FALSE`

to continue gathering statistics.

real, , (OUT)

Load imbalance defined as the ratio of computation time of the bottleneck processor to the average computation time of all processors.

logical, , (OUT)

`TRUE`

if the choosen heuristic recommends problem redecomposition.
Otherwise `FALSE`

. Redecomposition means: do the ppm_topo_mktopo again.

integer, , (OUT)

Estimated (linear extrapolation) number of time steps to go until next advised redecomposition. -1 is returned if the chosen heuristic does not support this kind of information. Be careful with this value!

integer, , (OUT)

Returns status, 0 upon success

ppm_module_data, ppm_module_error, ppm_module_data_loadbal, ppm_module_substop, ppm_module_write, ppm_module_substart

Inquires about the load balance status and returns advise whether redecomposing the problem is recommended (based on some decision heuristic the user chooses).

Tip | |
---|---|

The user should time (using ppm_time) computations for the topology/topologies considered for dyanmic remaping. The elapsed time is given to this routine. |

Note | |
---|---|

This estimate is currently a scalar, so only one topology/set of
topologies can be monitored. Introducing a topology set ID and making
the internal estimates a vector, this can later be extended to
multiple topology sets if needed. The topology set
ID for which to evaluate the load balance will then be an additional
argument to this routine. This routine does two global MPI
communication operations ( |

name | type | dimension | intent | optional | description |
---|---|---|---|---|---|

real | (IN) | Elapsed time (as measured by | |||

integer | (IN) | Number of time steps since last redecomposition. (>0) | |||

integer | (IN) | Decision heuristic for redecomposition advise. One of: | |||

logical | (IN) |
| |||

real | (OUT) | Load imbalance defined as the ratio of computation time of the | |||

logical | (OUT) |
| |||

integer | (OUT) | Estimated (linear extrapolation) number of time steps to go until next | |||

integer | (OUT) | Returns status, 0 upon success |

real, , (IN)

Elapsed time (as measured by `ppm_time`

) for all computation in one
time step on the local processor.

integer, , (IN)

Number of time steps since last redecomposition. (>0) If this routine is not called every time step, linear interpolation of the load imbalance will be used to reconstruct missing data points.

integer, , (IN)

Decision heuristic for redecomposition advise. One of:

- ppm_param_loadbal_sar (Stop-at-Rise heuristic)

logical, , (IN)

`TRUE`

to flush internal statistics (e.g. the first time this routine
is called after actually doing a redecomposition of the problem),
`FALSE`

to continue gathering statistics.

real, , (OUT)

Load imbalance defined as the ratio of computation time of the bottleneck processor to the average computation time of all processors.

logical, , (OUT)

`TRUE`

if the choosen heuristic recommends problem redecomposition.
Otherwise `FALSE`

. Redecomposition means: do the ppm_topo_mktopo again.

integer, , (OUT)

Estimated (linear extrapolation) number of time steps to go until next advised redecomposition. -1 is returned if the chosen heuristic does not support this kind of information. Be careful with this value!

integer, , (OUT)

Returns status, 0 upon success

ppm_module_data, ppm_module_error, ppm_module_data_loadbal, ppm_module_substop, ppm_module_write, ppm_module_substart

This routine calculates the computational cost of each subdomain and each processor.

Note | |
---|---|

If Np > 0, cost is computed based on particles. If mesh_id > -1, cost based on mesh points is computed and added to the particle cost. If neither particles nor mesh are given, cost of a subdomain is equal to its volume. |

name | type | dimension | intent | optional | description |
---|---|---|---|---|---|

integer | (IN) | Topology ID for which to compute the cost. | |||

integer | (IN) | mesh ID for which to compute the cost. | |||

real array | (:,:) | (IN) | Particle positions | ||

integer | (IN) | Number of particles. Set to ⇐ 0 if mesh-based costs are desired. | |||

real array | (:) | Aggregate cost for each subdomain | |||

real array | (:) | Aggregate cost for each processor | |||

integer | (OUT) | Returns status, 0 upon success | |||

real array | (:) | (IN) |
| per-particle costs. |

integer, , (IN)

Topology ID for which to compute the cost.
If `ppm_param_topo_undefined`

all particle positions on the processor
are considered to compute the cost

integer, , (IN)

mesh ID for which to compute the cost. If -1 is passed, only particles are considered and the mesh is ignored. If there are no particles and mesh_id is -1, the costs are computed based on the geometry: cost(sub) = volume(sub).

real array, `(:,:)`

, (IN)

Particle positions

integer, , (IN)

Number of particles. Set to ⇐ 0 if mesh-based costs are desired.

real array, `(:)`

, no intent declared

Aggregate cost for each subdomain

real array, `(:)`

, no intent declared

Aggregate cost for each processor

integer, , (OUT)

Returns status, 0 upon success

real array, `(:)`

, (IN)

per-particle costs. If not present, a cost of 1.0 per particle is assumed.

ppm_module_data, ppm_module_error, ppm_module_typedef, ppm_module_alloc, ppm_module_check_id, ppm_module_substop, ppm_module_topo_cost, ppm_module_write, ppm_module_substart

This routine calculates the computational cost of each subdomain and each processor.

Note | |
---|---|

If Np > 0, cost is computed based on particles. If mesh_id > -1, cost based on mesh points is computed and added to the particle cost. If neither particles nor mesh are given, cost of a subdomain is equal to its volume. |

name | type | dimension | intent | optional | description |
---|---|---|---|---|---|

integer | (IN) | Topology ID for which to compute the cost. | |||

integer | (IN) | mesh ID for which to compute the cost. | |||

real array | (:,:) | (IN) | Particle positions | ||

integer | (IN) | Number of particles. Set to ⇐ 0 if mesh-based costs are desired. | |||

real array | (:) | Aggregate cost for each subdomain | |||

real array | (:) | Aggregate cost for each processor | |||

integer | (OUT) | Returns status, 0 upon success | |||

real array | (:) | (IN) |
| per-particle costs. |

integer, , (IN)

Topology ID for which to compute the cost.
If `ppm_param_topo_undefined`

all particle positions on the processor
are considered to compute the cost

integer, , (IN)

mesh ID for which to compute the cost. If -1 is passed, only particles are considered and the mesh is ignored. If there are no particles and mesh_id is -1, the costs are computed based on the geometry: cost(sub) = volume(sub).

real array, `(:,:)`

, (IN)

Particle positions

integer, , (IN)

Number of particles. Set to ⇐ 0 if mesh-based costs are desired.

real array, `(:)`

, no intent declared

Aggregate cost for each subdomain

real array, `(:)`

, no intent declared

Aggregate cost for each processor

integer, , (OUT)

Returns status, 0 upon success

real array, `(:)`

, (IN)

per-particle costs. If not present, a cost of 1.0 per particle is assumed.

ppm_module_data, ppm_module_error, ppm_module_typedef, ppm_module_alloc, ppm_module_check_id, ppm_module_substop, ppm_module_topo_cost, ppm_module_write, ppm_module_substart

This routine can be used by the user to set the relative speeds of the processors (is used in subs2proc for load balancing).

name | type | dimension | intent | optional | description |
---|---|---|---|---|---|

real array | (0:) | (IN) | Relative speeds of all processors from 0 to ppm_nproc-1. The numbers | ||

integer | (OUT) | Returns status, 0 upon success |

real array, `(0:)`

, (IN)

Relative speeds of all processors from 0 to ppm_nproc-1. The numbers must sum up to 1.

integer, , (OUT)

Returns status, 0 upon success

ppm_module_data, ppm_module_error, ppm_module_substop, ppm_module_substart

This routine can be used by the user to set the relative speeds of the processors (is used in subs2proc for load balancing).

name | type | dimension | intent | optional | description |
---|---|---|---|---|---|

real array | (0:) | (IN) | Relative speeds of all processors from 0 to ppm_nproc-1. The numbers | ||

integer | (OUT) | Returns status, 0 upon success |

real array, `(0:)`

, (IN)

Relative speeds of all processors from 0 to ppm_nproc-1. The numbers must sum up to 1.

integer, , (OUT)

Returns status, 0 upon success

ppm_module_data, ppm_module_error, ppm_module_substop, ppm_module_substart

Sets/updates the internal estimate for the computational cost associated with redecomposing the problem. The user can choose the update method.

Tip | |
---|---|

The user should time (using |

Note | |
---|---|

This estimate is currently a scalar, so only one topology/set of topologies can be monitored. Introducing a topology set ID and making the internal estimates a vector, this can later be extended to multiple topology sets if needed. The topology set ID (in external numbering) for which to update the cost estimate will then be an additional argument to this routine. |

Note | |
---|---|

This routine does a global MPI operation ( |

name | type | dimension | intent | optional | description |
---|---|---|---|---|---|

real | (IN) | Elapsed time (as measured by | |||

integer | (IN) | How to update the internal estimate. | |||

integer | (OUT) | Returns status, 0 upon success |

real, , (IN)

Elapsed time (as measured by `ppm_time`

)
for defining the topology/ies of interest on the local processor.

integer, , (IN)

How to update the internal estimate. One of:

- ppm_param_update_replace (overwrite old value with new one)
- ppm_param_update_average (compute running average)
- ppm_param_update_expfavg (running average with exponential forgetting)

integer, , (OUT)

Returns status, 0 upon success

ppm_module_data, ppm_module_error, ppm_module_data_loadbal, ppm_module_substop, ppm_module_write, ppm_module_substart

Sets/updates the internal estimate for the computational cost associated with redecomposing the problem. The user can choose the update method.

Tip | |
---|---|

The user should time (using |

Note | |
---|---|

This estimate is currently a scalar, so only one topology/set of topologies can be monitored. Introducing a topology set ID and making the internal estimates a vector, this can later be extended to multiple topology sets if needed. The topology set ID (in external numbering) for which to update the cost estimate will then be an additional argument to this routine. |

Note | |
---|---|

This routine does a global MPI operation ( |

name | type | dimension | intent | optional | description |
---|---|---|---|---|---|

real | (IN) | Elapsed time (as measured by | |||

integer | (IN) | How to update the internal estimate. | |||

integer | (OUT) | Returns status, 0 upon success |

real, , (IN)

Elapsed time (as measured by `ppm_time`

)
for defining the topology/ies of interest on the local processor.

integer, , (IN)

How to update the internal estimate. One of:

- ppm_param_update_replace (overwrite old value with new one)
- ppm_param_update_average (compute running average)
- ppm_param_update_expfavg (running average with exponential forgetting)

integer, , (OUT)

Returns status, 0 upon success