Strategies to run large-memory calculations?
Posted: 15 Jul 2024, 20:31
Hi,
I am trying to run some DMRG simulations on a cluster. However, I eventually run out of memory. In the slurm file, I get the error:
Due to unfamiliarity with parallelization, the only thing I am doing right now is having the following line before loading python and running the python script in my bash script:
As for the nodes I used, I tried both of the following but get similar errors:
and
My script code is quite standard, with the simulation actually running using:
Do you have any suggestions on how I could make this job run? Is there a way to perhaps write arrays to disk during the process instead of doing everything in memory? Or some other strategy?
Thanks!
I am trying to run some DMRG simulations on a cluster. However, I eventually run out of memory. In the slurm file, I get the error:
Python: Select all
Traceback (most recent call last):
File "/mmfs1/gscratch/.../conda_env/lib/python3.7/site-packages/tenpy/algorithms/dmrg.py", line 1831, in mix_rho_L
LHeff = engine.LHeff
AttributeError: 'TwoSiteDMRGEngine' object has no attribute 'LHeff'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "run.py", line 191, in <module>
info = eng.run() # the main work; modifies psi in place
....
....
File "tenpy/linalg/_npc_helper.pyx", line 684, in tenpy.linalg._npc_helper.Array_itranspose_fast
numpy.core._exceptions.MemoryError: Unable to allocate 22.5 GiB for an array with shape (19431, 2, 1, 19431, 2) and data type complex128
Python: Select all
export OMP_NUM_THREADS=192
As for the nodes I used, I tried both of the following but get similar errors:
Python: Select all
#SBATCH --nodes=2
#SBATCH --ntasks-per-node=96
#SBATCH --ntasks=192
#SBATCH --mem=1495GB
Python: Select all
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=96
#SBATCH --ntasks=1
#SBATCH --mem=1495GB
Python: Select all
# added below, changes psi
eng = dmrg.TwoSiteDMRGEngine(psi, M, dmrg_params)
info = eng.run() # the main work; modifies psi in place
E = info[0]
psi = info[1]
Thanks!