Module ppm_module_loadbal

This module contains all routines needed for dynamic load balancing.

Currently only the SOR (stop at rise) method is provided as the heuristic to estimate the time when the domain needs to be redecomposed.

Defined Types

name description

no types

Defined Module Interfaces

Defined Module Subroutines

name description

no subroutines

Interface ppm_estimate_procspeed

Subroutines contained in this interface:

name description

est_procspeed_s

This routine can be used to estimate the relative speeds

est_procspeed_d

This routine can be used to estimate the relative speeds

Interface ppm_get_cost

Subroutines contained in this interface:

name description

ppm_get_cost_s

This routine calculates the computational cost of each subdomain

ppm_get_cost_d

This routine calculates the computational cost of each subdomain

Interface ppm_loadbal_inquire

Subroutines contained in this interface:

name description

loadbal_inq_s

Inquires about the load balance status and returns advise whether

loadbal_inq_d

Inquires about the load balance status and returns advise whether

Interface ppm_set_decomp_cost

Subroutines contained in this interface:

name description

set_dcost_s

Sets/updates the internal estimate for the computational cost

set_dcost_d

Sets/updates the internal estimate for the computational cost

Interface ppm_set_proc_speed

Subroutines contained in this interface:

name description

ppm_set_proc_speed_s

This routine can be used by the user to set the relative speeds of

ppm_set_proc_speed_d

This routine can be used by the user to set the relative speeds of

Subroutine est_procspeed_d

This routine can be used to estimate the relative speeds of the processors. It is also being used in subs2proc for load balancing. The estimation is done by computing a number of Lennard-Jones PP interactions.

Arguments

name type dimension intent optional description

procspeed

real array

(:)

Relative speeds of all processors from 0 to ppm_nproc-1. The numbers

info

integer

(OUT)

return status, 0 on success

mintime

real

(IN)

X

OPTIONAL Minimum time for which benchmark is required to run on each

maxtime

real

(IN)

X

OPTIONAL Benchmark stops as soon as slowest processor has run for

npart

integer

(IN)

X

Initial number of LJ-particles to use. OPTIONAL. Default is

procspeed

real array, (:), no intent declared

Relative speeds of all processors from 0 to ppm_nproc-1. The numbers do sum up to 1.

info

integer, , (OUT)

return status, 0 on success

(Optional) mintime

real, , (IN)

OPTIONAL Minimum time for which benchmark is required to run on each processor. Default is 0.1 seconds.

(Optional) maxtime

real, , (IN)

OPTIONAL Benchmark stops as soon as slowest processor has run for this time (provided that mintime requirement is met). Default is 5 seconds

(Optional) npart

integer, , (IN)

Initial number of LJ-particles to use. OPTIONAL. Default is 1000. Set this to a smaller value when running on a slow processor. Increase is done automatically to meet the time requirements.

Used Modules

ppm_module_data, ppm_module_error, ppm_module_alloc, ppm_module_substop, ppm_module_util_time, ppm_module_write, ppm_module_substart

Subroutine est_procspeed_s

This routine can be used to estimate the relative speeds of the processors. It is also being used in subs2proc for load balancing. The estimation is done by computing a number of Lennard-Jones PP interactions.

Arguments

name type dimension intent optional description

procspeed

real array

(:)

Relative speeds of all processors from 0 to ppm_nproc-1. The numbers

info

integer

(OUT)

return status, 0 on success

mintime

real

(IN)

X

OPTIONAL Minimum time for which benchmark is required to run on each

maxtime

real

(IN)

X

OPTIONAL Benchmark stops as soon as slowest processor has run for

npart

integer

(IN)

X

Initial number of LJ-particles to use. OPTIONAL. Default is

procspeed

real array, (:), no intent declared

Relative speeds of all processors from 0 to ppm_nproc-1. The numbers do sum up to 1.

info

integer, , (OUT)

return status, 0 on success

(Optional) mintime

real, , (IN)

OPTIONAL Minimum time for which benchmark is required to run on each processor. Default is 0.1 seconds.

(Optional) maxtime

real, , (IN)

OPTIONAL Benchmark stops as soon as slowest processor has run for this time (provided that mintime requirement is met). Default is 5 seconds

(Optional) npart

integer, , (IN)

Initial number of LJ-particles to use. OPTIONAL. Default is 1000. Set this to a smaller value when running on a slow processor. Increase is done automatically to meet the time requirements.

Used Modules

ppm_module_data, ppm_module_error, ppm_module_alloc, ppm_module_substop, ppm_module_util_time, ppm_module_write, ppm_module_substart

Subroutine loadbal_inq_d

Inquires about the load balance status and returns advise whether redecomposing the problem is recommended (based on some decision heuristic the user chooses).

[Tip]Tip

The user should time (using ppm_time) computations for the topology/topologies considered for dyanmic remaping. The elapsed time is given to this routine.

[Note]Note

This estimate is currently a scalar, so only one topology/set of topologies can be monitored. Introducing a topology set ID and making the internal estimates a vector, this can later be extended to multiple topology sets if needed. The topology set ID for which to evaluate the load balance will then be an additional argument to this routine. This routine does two global MPI communication operations (MPI_Allreduce).

Arguments

name type dimension intent optional description

ctime

real

(IN)

Elapsed time (as measured by ppm_time) for all computation in one

nstep

integer

(IN)

Number of time steps since last redecomposition. (>0)

heuristic

integer

(IN)

Decision heuristic for redecomposition advise. One of:

lflush

logical

(IN)

TRUE to flush internal statistics (e.g. the first time this routine

imbal

real

(OUT)

Load imbalance defined as the ratio of computation time of the

lredecomp

logical

(OUT)

TRUE if the choosen heuristic recommends problem redecomposition.

nredest

integer

(OUT)

Estimated (linear extrapolation) number of time steps to go until next

info

integer

(OUT)

Returns status, 0 upon success

ctime

real, , (IN)

Elapsed time (as measured by ppm_time) for all computation in one time step on the local processor.

nstep

integer, , (IN)

Number of time steps since last redecomposition. (>0) If this routine is not called every time step, linear interpolation of the load imbalance will be used to reconstruct missing data points.

heuristic

integer, , (IN)

Decision heuristic for redecomposition advise. One of:

  • ppm_param_loadbal_sar (Stop-at-Rise heuristic)
lflush

logical, , (IN)

TRUE to flush internal statistics (e.g. the first time this routine is called after actually doing a redecomposition of the problem), FALSE to continue gathering statistics.

imbal

real, , (OUT)

Load imbalance defined as the ratio of computation time of the bottleneck processor to the average computation time of all processors.

lredecomp

logical, , (OUT)

TRUE if the choosen heuristic recommends problem redecomposition. Otherwise FALSE. Redecomposition means: do the ppm_topo_mktopo again.

nredest

integer, , (OUT)

Estimated (linear extrapolation) number of time steps to go until next advised redecomposition. -1 is returned if the chosen heuristic does not support this kind of information. Be careful with this value!

info

integer, , (OUT)

Returns status, 0 upon success

Used Modules

ppm_module_data, ppm_module_error, ppm_module_data_loadbal, ppm_module_substop, ppm_module_write, ppm_module_substart

Subroutine loadbal_inq_s

Inquires about the load balance status and returns advise whether redecomposing the problem is recommended (based on some decision heuristic the user chooses).

[Tip]Tip

The user should time (using ppm_time) computations for the topology/topologies considered for dyanmic remaping. The elapsed time is given to this routine.

[Note]Note

This estimate is currently a scalar, so only one topology/set of topologies can be monitored. Introducing a topology set ID and making the internal estimates a vector, this can later be extended to multiple topology sets if needed. The topology set ID for which to evaluate the load balance will then be an additional argument to this routine. This routine does two global MPI communication operations (MPI_Allreduce).

Arguments

name type dimension intent optional description

ctime

real

(IN)

Elapsed time (as measured by ppm_time) for all computation in one

nstep

integer

(IN)

Number of time steps since last redecomposition. (>0)

heuristic

integer

(IN)

Decision heuristic for redecomposition advise. One of:

lflush

logical

(IN)

TRUE to flush internal statistics (e.g. the first time this routine

imbal

real

(OUT)

Load imbalance defined as the ratio of computation time of the

lredecomp

logical

(OUT)

TRUE if the choosen heuristic recommends problem redecomposition.

nredest

integer

(OUT)

Estimated (linear extrapolation) number of time steps to go until next

info

integer

(OUT)

Returns status, 0 upon success

ctime

real, , (IN)

Elapsed time (as measured by ppm_time) for all computation in one time step on the local processor.

nstep

integer, , (IN)

Number of time steps since last redecomposition. (>0) If this routine is not called every time step, linear interpolation of the load imbalance will be used to reconstruct missing data points.

heuristic

integer, , (IN)

Decision heuristic for redecomposition advise. One of:

  • ppm_param_loadbal_sar (Stop-at-Rise heuristic)
lflush

logical, , (IN)

TRUE to flush internal statistics (e.g. the first time this routine is called after actually doing a redecomposition of the problem), FALSE to continue gathering statistics.

imbal

real, , (OUT)

Load imbalance defined as the ratio of computation time of the bottleneck processor to the average computation time of all processors.

lredecomp

logical, , (OUT)

TRUE if the choosen heuristic recommends problem redecomposition. Otherwise FALSE. Redecomposition means: do the ppm_topo_mktopo again.

nredest

integer, , (OUT)

Estimated (linear extrapolation) number of time steps to go until next advised redecomposition. -1 is returned if the chosen heuristic does not support this kind of information. Be careful with this value!

info

integer, , (OUT)

Returns status, 0 upon success

Used Modules

ppm_module_data, ppm_module_error, ppm_module_data_loadbal, ppm_module_substop, ppm_module_write, ppm_module_substart

Subroutine ppm_get_cost_d

This routine calculates the computational cost of each subdomain and each processor.

[Note]Note

If Np > 0, cost is computed based on particles. If mesh_id > -1, cost based on mesh points is computed and added to the particle cost. If neither particles nor mesh are given, cost of a subdomain is equal to its volume.

Arguments

name type dimension intent optional description

topoid

integer

(IN)

Topology ID for which to compute the cost.

meshid

integer

(IN)

mesh ID for which to compute the cost.

xp

real array

(:,:)

(IN)

Particle positions

np

integer

(IN)

Number of particles. Set to ⇐ 0 if mesh-based costs are desired.

cost

real array

(:)

Aggregate cost for each subdomain

proc_cost

real array

(:)

Aggregate cost for each processor

info

integer

(OUT)

Returns status, 0 upon success

pcost

real array

(:)

(IN)

X

per-particle costs.

topoid

integer, , (IN)

Topology ID for which to compute the cost. If ppm_param_topo_undefined all particle positions on the processor are considered to compute the cost

meshid

integer, , (IN)

mesh ID for which to compute the cost. If -1 is passed, only particles are considered and the mesh is ignored. If there are no particles and mesh_id is -1, the costs are computed based on the geometry: cost(sub) = volume(sub).

xp

real array, (:,:), (IN)

Particle positions

np

integer, , (IN)

Number of particles. Set to ⇐ 0 if mesh-based costs are desired.

cost

real array, (:), no intent declared

Aggregate cost for each subdomain

proc_cost

real array, (:), no intent declared

Aggregate cost for each processor

info

integer, , (OUT)

Returns status, 0 upon success

(Optional) pcost

real array, (:), (IN)

per-particle costs. If not present, a cost of 1.0 per particle is assumed.

Used Modules

ppm_module_data, ppm_module_error, ppm_module_typedef, ppm_module_alloc, ppm_module_check_id, ppm_module_substop, ppm_module_topo_cost, ppm_module_write, ppm_module_substart

Subroutine ppm_get_cost_s

This routine calculates the computational cost of each subdomain and each processor.

[Note]Note

If Np > 0, cost is computed based on particles. If mesh_id > -1, cost based on mesh points is computed and added to the particle cost. If neither particles nor mesh are given, cost of a subdomain is equal to its volume.

Arguments

name type dimension intent optional description

topoid

integer

(IN)

Topology ID for which to compute the cost.

meshid

integer

(IN)

mesh ID for which to compute the cost.

xp

real array

(:,:)

(IN)

Particle positions

np

integer

(IN)

Number of particles. Set to ⇐ 0 if mesh-based costs are desired.

cost

real array

(:)

Aggregate cost for each subdomain

proc_cost

real array

(:)

Aggregate cost for each processor

info

integer

(OUT)

Returns status, 0 upon success

pcost

real array

(:)

(IN)

X

per-particle costs.

topoid

integer, , (IN)

Topology ID for which to compute the cost. If ppm_param_topo_undefined all particle positions on the processor are considered to compute the cost

meshid

integer, , (IN)

mesh ID for which to compute the cost. If -1 is passed, only particles are considered and the mesh is ignored. If there are no particles and mesh_id is -1, the costs are computed based on the geometry: cost(sub) = volume(sub).

xp

real array, (:,:), (IN)

Particle positions

np

integer, , (IN)

Number of particles. Set to ⇐ 0 if mesh-based costs are desired.

cost

real array, (:), no intent declared

Aggregate cost for each subdomain

proc_cost

real array, (:), no intent declared

Aggregate cost for each processor

info

integer, , (OUT)

Returns status, 0 upon success

(Optional) pcost

real array, (:), (IN)

per-particle costs. If not present, a cost of 1.0 per particle is assumed.

Used Modules

ppm_module_data, ppm_module_error, ppm_module_typedef, ppm_module_alloc, ppm_module_check_id, ppm_module_substop, ppm_module_topo_cost, ppm_module_write, ppm_module_substart

Subroutine ppm_set_proc_speed_d

This routine can be used by the user to set the relative speeds of the processors (is used in subs2proc for load balancing).

Arguments

name type dimension intent optional description

proc_speed

real array

(0:)

(IN)

Relative speeds of all processors from 0 to ppm_nproc-1. The numbers

info

integer

(OUT)

Returns status, 0 upon success

proc_speed

real array, (0:), (IN)

Relative speeds of all processors from 0 to ppm_nproc-1. The numbers must sum up to 1.

info

integer, , (OUT)

Returns status, 0 upon success

Used Modules

ppm_module_data, ppm_module_error, ppm_module_substop, ppm_module_substart

Subroutine ppm_set_proc_speed_s

This routine can be used by the user to set the relative speeds of the processors (is used in subs2proc for load balancing).

Arguments

name type dimension intent optional description

proc_speed

real array

(0:)

(IN)

Relative speeds of all processors from 0 to ppm_nproc-1. The numbers

info

integer

(OUT)

Returns status, 0 upon success

proc_speed

real array, (0:), (IN)

Relative speeds of all processors from 0 to ppm_nproc-1. The numbers must sum up to 1.

info

integer, , (OUT)

Returns status, 0 upon success

Used Modules

ppm_module_data, ppm_module_error, ppm_module_substop, ppm_module_substart

Subroutine set_dcost_d

Sets/updates the internal estimate for the computational cost associated with redecomposing the problem. The user can choose the update method.

[Tip]Tip

The user should time (using ppm_time) topo_mktopo for the topology/topologies considered for dynamic remapping (ppm does not know which ones they are). The elapsed time is given to this routine to update the internal estimate.

[Note]Note

This estimate is currently a scalar, so only one topology/set of topologies can be monitored. Introducing a topology set ID and making the internal estimates a vector, this can later be extended to multiple topology sets if needed. The topology set ID (in external numbering) for which to update the cost estimate will then be an additional argument to this routine.

[Note]Note

This routine does a global MPI operation (Allreduce).

Arguments

name type dimension intent optional description

dcost

real

(IN)

Elapsed time (as measured by ppm_time)

method

integer

(IN)

How to update the internal estimate.

info

integer

(OUT)

Returns status, 0 upon success

dcost

real, , (IN)

Elapsed time (as measured by ppm_time) for defining the topology/ies of interest on the local processor.

method

integer, , (IN)

How to update the internal estimate. One of:

  • ppm_param_update_replace (overwrite old value with new one)
  • ppm_param_update_average (compute running average)
  • ppm_param_update_expfavg (running average with exponential forgetting)
info

integer, , (OUT)

Returns status, 0 upon success

Used Modules

ppm_module_data, ppm_module_error, ppm_module_data_loadbal, ppm_module_substop, ppm_module_write, ppm_module_substart

Subroutine set_dcost_s

Sets/updates the internal estimate for the computational cost associated with redecomposing the problem. The user can choose the update method.

[Tip]Tip

The user should time (using ppm_time) topo_mktopo for the topology/topologies considered for dynamic remapping (ppm does not know which ones they are). The elapsed time is given to this routine to update the internal estimate.

[Note]Note

This estimate is currently a scalar, so only one topology/set of topologies can be monitored. Introducing a topology set ID and making the internal estimates a vector, this can later be extended to multiple topology sets if needed. The topology set ID (in external numbering) for which to update the cost estimate will then be an additional argument to this routine.

[Note]Note

This routine does a global MPI operation (Allreduce).

Arguments

name type dimension intent optional description

dcost

real

(IN)

Elapsed time (as measured by ppm_time)

method

integer

(IN)

How to update the internal estimate.

info

integer

(OUT)

Returns status, 0 upon success

dcost

real, , (IN)

Elapsed time (as measured by ppm_time) for defining the topology/ies of interest on the local processor.

method

integer, , (IN)

How to update the internal estimate. One of:

  • ppm_param_update_replace (overwrite old value with new one)
  • ppm_param_update_average (compute running average)
  • ppm_param_update_expfavg (running average with exponential forgetting)
info

integer, , (OUT)

Returns status, 0 upon success

Used Modules

ppm_module_data, ppm_module_error, ppm_module_data_loadbal, ppm_module_substop, ppm_module_write, ppm_module_substart

Defined Module Variables

name type dimension description

no variables

Used Modules

has no uses