By studying the tenpy documentation, I think that 'sort_mpo_legs': True seems to be able to speed up the running speed of the code when computing complex models.
The MPO tensors in W are sparse, and without the 'sort_mpo_legs': True, we try to capture that, resulting in fewer small blocks for tensordots involving a H.get_W(i) rather than bigger blocks (as dictated by charge conservation).
For this benchmark comparison, we wanted to compare "apples to apples" and tried to setup everything the same way for iTensor and TeNPy, hence we fixed sort_mpo_legs: True, because iTensor didn't have that sparse-H feater at that time (I'm not sure if they implemented it by now).
Wether it actually speeds things up depends very much on the details of the H you're using, and how much overhead you have from multiplying the multiple small blocks. For large blocks (i.e. high MPS bond dimensions) and sparse W in H, `sort_mpo_legs: False` should in principle be a bit faster than `sort_mpo_legs: True`, - but it really helps, or whether the overhead is more significant for what you want to do, you need to check yourself for your specific use-case. In practice, I don't think it makes a huge difference.
The memory usage should be roughly the same, independent of this option.