Hi All,
I am doing some DMRG calculations for the Fermi-Hubbard model using FermiHubbardModel(). I'm running the code on a cluster with 8 OMP threads and have been seeing some strange trends in the run time observed as I change the max bond dimension. In particular, certain values of chi_max seem to take wildly longer than others for no apparent reason. The data below are from a 4x4 square lattice with no pbc's, U=4, t=1, mu=0, svd_min=1e-10, max_E_err=1e-10 and the starting state is the Neel state:
chi_max, time
100, 75
200, 2007
300, 2572
400, 156
500, 182
600, 229
700, 292
800, 360
900, 428
1000, 510
1250, 610
1500, 2276
1750, 962
2000, 1151
2500, 1242
I have run these multiple times and even using the full node to myself and I see the same trend. I have also tried with just 1 and 2 OMP threads for the spurious points and observe the same trend. I installed tenpy from source and compiled it (without MKL). Does anyone know what could be causing values of e.g. chi_max = 300 to take longer than values that are as large as chi_max=2500?
Thank you
Strange timing data for DMRG applied to Fermi-Hubbard model
Re: Strange timing data for DMRG applied to Fermi-Hubbard model
To reliably benchmark DMRG, I recommend to pin the number of sweeps by setting min_sweeps=max_sweeps to some fixed number - the total run time is just proportional to the number of sweeps, and how many sweeps it does can sometimes depend on subtle differences in convergence thresholds.
Indeed, your timing differences likely comes from massively different number of sweeps until it "converges" - you can check this easily in the log files, if you still have them around. I could not reproduce you result exactly, but in my case, I had 79 sweeps at chi=100 taking 90 seconds contrasting 275 sweeps at chi=200 taking 447 seconds.
By setting the max_E_err=1.e-10, you force DMRG to continue converging way better than the truncation error actually allows - for chi=300, I get a truncation error of 3.e-6, and E_trunc=5e-5 - that's way larger than the max_E_error that you try to converge to, so it relies on "converging" by randomly having values close together.
Indeed, your timing differences likely comes from massively different number of sweeps until it "converges" - you can check this easily in the log files, if you still have them around. I could not reproduce you result exactly, but in my case, I had 79 sweeps at chi=100 taking 90 seconds contrasting 275 sweeps at chi=200 taking 447 seconds.
By setting the max_E_err=1.e-10, you force DMRG to continue converging way better than the truncation error actually allows - for chi=300, I get a truncation error of 3.e-6, and E_trunc=5e-5 - that's way larger than the max_E_error that you try to converge to, so it relies on "converging" by randomly having values close together.