Brute force parallel execution / multiprocessing

Qottmann · Post by **Qottmann** » 08 May 2019, 14:39

Hello again,

Disclaimer: I am far from a computer scientist and probably mix and use wrong terminology

Also, I am not talking about going deep into the implementation like here viewtopic.php?t=4

I have to calculate many independent ground states for different physical parameters. I was hoping I could parallelize those executions due to their independent nature. Say run_dmrg(g) computes the ground state for a given physical parameter g, I was trying to implement Python's multiprocessing package as follows

Code: Select all

import multiprocessing as mp
pool = mp.Pool(mp.cpu_count())
for g in enumerate(gs):
        pool.apply_async(run_dmrg, args=(g), callback=collect_result)

(where collect_result is some previously defined callback function for storing the results, needed for asynchronous parallelization)

Now in principle this works but as soon as I turn to bigger system sizes and bond dimensions, the execution suddenly takes forever. As I found out here (https://github.com/numpy/numpy/issues/10145), fast matrix multiplication in e.g. numpy is based on fast caching, so even though you may have individual processes (threads?), they still rely on the same memory and trying do several big computations in parallel "blocks" the cache, basically breaking down the calculation.

Now my question is, whether that is already the end of the road or if someone has a better idea?
At the moment I just create multiple virtual machines on the cluster and let them individually compute different parameters, but I guess this is not exactly good practice.

As always, answers and suggestions are greatly appreciated.
Best regards,
Korbinian

Leon · Post by **Leon** » 06 Jun 2019, 09:21

Qottmann wrote: ↑08 May 2019, 14:39 ...
I have to calculate many independent ground states for different physical parameters. I was hoping I could parallelize those executions due to their independent nature.
...

If you have an executable-level script (like any of the example codes) and want to run this many times in parallel for different parameter points, your best bet is probably to either use your cluster's scheduler (talk to your cluster admin to find out how to do this), or use a tool like GNU parallel (see http://www.gnu.org/software/parallel/).

TeNPy Forum

Brute force parallel execution / multiprocessing

Brute force parallel execution / multiprocessing

Re: Brute force parallel execution / multiprocessing