Disclaimer: I am far from a computer scientist and probably mix and use wrong terminology Also, I am not talking about going deep into the implementation like here viewtopic.php?t=4

I have to calculate many independent ground states for different physical parameters. I was hoping I could parallelize those executions due to their independent nature. Say run_dmrg(g) computes the ground state for a given physical parameter g, I was trying to implement Python's multiprocessing package as follows

Code: Select all

```
import multiprocessing as mp
pool = mp.Pool(mp.cpu_count())
for g in enumerate(gs):
pool.apply_async(run_dmrg, args=(g), callback=collect_result)
```

Now in principle this works but as soon as I turn to bigger system sizes and bond dimensions, the execution suddenly takes forever. As I found out here (https://github.com/numpy/numpy/issues/10145), fast matrix multiplication in e.g. numpy is based on fast caching, so even though you may have individual processes (threads?), they still rely on the same memory and trying do several big computations in parallel "blocks" the cache, basically breaking down the calculation.

**Now my question is, whether that is already the end of the road or if someone has a better idea?**

At the moment I just create multiple virtual machines on the cluster and let them individually compute different parameters, but I guess this is not exactly good practice.

As always, answers and suggestions are greatly appreciated.

Best regards,

Korbinian