Perfect poem, thank you. I shall print it out and stick it to the wall of the church in Vallico next to the song you composed when you were last here..!
It seems that I am therefore now obliged to fix the T-move memory hogging for you - I should be able to have a look next week.
For some reason or the other, I still do not get it: I would have thought that there is one array of dimension 50 x 3 x nelectrons x nions...
You might think that - however, if you look at the source you find there are five relevant T-move arrays.
Some of these depend on the number of points in the non-local pseudopotential spherical grid, where the relevant quantities are defined as:
! Maximum number of points in spherical grid
INTEGER,PARAMETER :: nl_maxnrefgrid=50
! Actual number of points in current spherical grid
INTEGER,ALLOCATABLE :: nl_nrefgrid(:)
and let's say that
maxnlang=maxval(nlang) ! Highest l required for any pseudopotential.
Assume 1000 electrons, 1000 nuclei (i.e. a thousand H atoms in a box), maxnlang=2
Integer takes 4 bytes
Double previsions takes 8 bytes
Originally the arrays were allocated as follows:
! Set up arrays for T move if necessary.
if(use_tmove)then
allocate(
i tmove_no_points(nitot,netot) : 1000000 * 4 = 4000000
dp tmove_T(nl_maxnrefgrid,nitot,netot) : 50000000 * 8 = 400000000
dp tmove_T_moved(nl_maxnrefgrid,nitot,netot) : 50000000 * 8 = 400000000
dp tmove_T_full(nl_maxnrefgrid,0:maxnlang,nitot) : 150000 * 8 = 1200000
dp tmove_points(3,nl_maxnrefgrid,nitot,netot) : 150000000* 8 = 1200000000
)
endif ! use_tmove
Total memory requirement: 4000000+400000000+400000000+1200000+1200000000
2,005,200,000 = 2Gb
At the time I was looking at this, I was using a 64 core per node Blue Gene Q machine, thus - since it seems we can't put these things in shared memory - a total of 128 Gb per node was required just for these five vectors.
The original obvious way that I improved this (in the DIARY entry mentioned above) was to notice that it was using nl_maxnrefgrid = 50 to allocate the latter 4 vectors, but this amount of space is never required unless you use the highest accuracy spherical grid, which no-one ever does because it is unnecessary and wasteful.
Remember:
+---------------------------------------------------------+
| NON_LOCAL_GRID Exactly integrates l=... No. points |
+---------------------------------------------------------+
| 1 0 1 |
| 2 2 4 |
| 3 3 6 |
| 4 5 12 |
| 5 5 18 |
| 6 7 26 |
| 7 11 50 |
+---------------------------------------------------------+
If you use the the actual size of the grid in this H-atom case instead - call it nl_nrefgrid - then you get:
i tmove_no_points(nitot,netot) : 1000000 * 4 = 4000000
dp tmove_T(nl_nrefgrid,nitot,netot) : 12000000 * 8 = 96000000
dp tmove_T_moved(nl_nrefgrid,nitot,netot) : 12000000 * 8 = 96000000
dp tmove_T_full(nl_nrefgrid,0:maxnlang,nitot) : 36000 * 8 = 288000
dp tmove_points(3,nl_nrefgrid,nitot,netot) : 36000000 * 8 = 288000000
Total : 4000000+96000000+96000000+288000+288000000
484,288,000 = 484Mb (* 64 = 30.7Gb - saving of 97.28 Gb)
which is a huge saving of course, but these vectors are still ridiculously large.
Now it's pretty clear that the original author didn't think about memory requirements, or he would have done the above grid thing in the first place (the change was very easy..). Therefore if we ask ourselves whether it is necessary to store all this information in this way, I bet that we find that it isn't and that we could do this in a more efficient way. I will have a look next week!
Cheers,
Mike