Details
-
Task
-
Status: Resolved
-
Minor
-
Resolution: Won't Fix
-
None
-
None
-
None
Description
Recent profiling in simulations point to random number generation (RNG) taking up to half the execution time. The instructions issued to CPU are quite different from those for the rest of the simulation (int64 ops) and writes to cache. It would be worth checking if the RNG can be parallelized on sibling hyperthreads. Some useful info found on SO of course:
https://stackoverflow.com/a/7275270
https://stackoverflow.com/a/28788061
https://pypi.org/project/python3-hwloc/
This task should use recent Montbrio kernels and issue RNG calls on a parallel thread to the main simulation. The hwloc library should pin the sim thread and the rng thread on sibling logical processors of the same physical core. This scenario can be compared to the sequential case on a single core. Lastly, scaling this benchmark up and down, in terms of the size of the connectivity, would help assess how useful if at all.