I encountered the following error after about 5000 DMC steps.
ERROR : COMPUTE_MULTIPLICITIES
A computed multiplicity is bigger than the largest representable integer on
this machine. Likely you have a really bad trial function.
Is it really that bad? For smaller timesteps i don't get such problems (although the calculations are in progress). If not, what can be the reason of such error?
You may need to take a look on input and out files.
input
out
correlation.data
gwfn.data
Error: compute_multiplcities
-
- Posts: 239
- Joined: Thu May 30, 2013 11:03 pm
- Location: Florence
- Contact:
Re: Error: compute_multiplcities
Hi Blazej,
Your links don't point to the right files..? (input --> out, out--> correlation.data etc.., so that we end up missing the input file)
Mike
Your links don't point to the right files..? (input --> out, out--> correlation.data etc.., so that we end up missing the input file)
Mike
Re: Error: compute_multiplcities
Ok, here is the input. Btw I tried to attach it to the post, but the forum does not like the extensions.
-
- Posts: 239
- Joined: Thu May 30, 2013 11:03 pm
- Location: Florence
- Contact:
Re: Error: compute_multiplcities
Hi Blazej,
Can you post the graphs obtained from running 'graphdmc' on the dmc.hist file so we can verify?
Although population catastrophes shouldn't occur under normal circumstances, they can be made more likely by various things - such as certain non-local pseudopotentials, non cusp-corrected Gaussian basis, too-severely truncated localized orbitals - none of these apply to you.
Another possibility is inadequate trial wave functions resulting from an optimization procedure that wasn't done properly. but even if it was done properly, note that the likelihood of population explosions can be reduced if you use unreweighted variance minimization rather than energy minimization to optimize the wave function. The reason for this is related to the fact that energy minimisation doesn't care much about what happens near nodes, since those regions do not contribute much to the energy expectation value. However, the divergent local energies there make a big difference to the stability of the DMC algorithm.
Also, even if none of the above were true, and you had a very low probability (let's say one move in five million) of encountering a catastrophe, the fact that you're running a ten million move simulation (!) means it's extremely likely to happen almost at once (cf. Infinite Improbability Drive..). Also, catastrophes are much more likely to happen for large timesteps (and you say it doesn't happen for small timesteps). Ergo.. use smaller ones. - but that said your current value of 0.003 doesn't seem excessive..
This ought to be manageable if you turned on automatic block-resetting (which detects when a catastophe occurs and rewinds backwards to an earlier point in order to try again with a different random number sequence) but you don't appear to have done this. See the section in the manual about 'Automatic block resetting' or type 'casinohelp dmc_trip_weight' and have a go.
The compute_multiplicities error message could possibly be a bit more clear - I'll try to tidy it up..
Some minor notes:
(1) can you really only afford 16 cores? That's not normally considered enough to do a 144 electron system... No wonder you want to do 10000000 moves!
(2) Blocks are more for your convenience than the computers (except, of course, for block_resetting), and having too many of them will slow down the code considerably, as it has to write to disk at the end of every block.. You might in fact like to switch over too my new 'BLOCK TIME' system that I implemented last week, where you can specify the approximate time interval that should separate blocks rather than by saying how many moves consitutute a block as currently.. I may even remove vmc_nblock, dmc_equil_nblock, and dmc_stats_nblock from the example input files in favour of block_time.. Year Zero, or what?
M.
It does, but you have to gzip them as it believes that plain text files could be malware (see the 'How to use these forums' post in 'General Announcements')...the forum does not like the extensions.
Your run is far too long for me to run on 16 cores, but it looks like a perfectly ordinary population catastrophe to me (ordinary catastrophes! you don't hear that very often..).Is it really that bad? For smaller timesteps i don't get such problems (although the calculations are in progress). If not, what can be the reason of such error?
Can you post the graphs obtained from running 'graphdmc' on the dmc.hist file so we can verify?
Although population catastrophes shouldn't occur under normal circumstances, they can be made more likely by various things - such as certain non-local pseudopotentials, non cusp-corrected Gaussian basis, too-severely truncated localized orbitals - none of these apply to you.
Another possibility is inadequate trial wave functions resulting from an optimization procedure that wasn't done properly. but even if it was done properly, note that the likelihood of population explosions can be reduced if you use unreweighted variance minimization rather than energy minimization to optimize the wave function. The reason for this is related to the fact that energy minimisation doesn't care much about what happens near nodes, since those regions do not contribute much to the energy expectation value. However, the divergent local energies there make a big difference to the stability of the DMC algorithm.
Also, even if none of the above were true, and you had a very low probability (let's say one move in five million) of encountering a catastrophe, the fact that you're running a ten million move simulation (!) means it's extremely likely to happen almost at once (cf. Infinite Improbability Drive..). Also, catastrophes are much more likely to happen for large timesteps (and you say it doesn't happen for small timesteps). Ergo.. use smaller ones. - but that said your current value of 0.003 doesn't seem excessive..
This ought to be manageable if you turned on automatic block-resetting (which detects when a catastophe occurs and rewinds backwards to an earlier point in order to try again with a different random number sequence) but you don't appear to have done this. See the section in the manual about 'Automatic block resetting' or type 'casinohelp dmc_trip_weight' and have a go.
The compute_multiplicities error message could possibly be a bit more clear - I'll try to tidy it up..
Some minor notes:
(1) can you really only afford 16 cores? That's not normally considered enough to do a 144 electron system... No wonder you want to do 10000000 moves!
(2) Blocks are more for your convenience than the computers (except, of course, for block_resetting), and having too many of them will slow down the code considerably, as it has to write to disk at the end of every block.. You might in fact like to switch over too my new 'BLOCK TIME' system that I implemented last week, where you can specify the approximate time interval that should separate blocks rather than by saying how many moves consitutute a block as currently.. I may even remove vmc_nblock, dmc_equil_nblock, and dmc_stats_nblock from the example input files in favour of block_time.. Year Zero, or what?
M.
Re: Error: compute_multiplcities
Ok, I attach the graphs.
Ok, I should use less of them, I made that many to get frequent updates of calculation, and also, if I got a population explosion, to restart the calculation from possibly furthest point, not to have to repeat much of it.
Why it didn't throw a standard "population explosion" error, as it usually does, (I am just curious - did something else happen because of population explosion)?Mike Towler wrote: but it looks like a perfectly ordinary population catastrophe to me (ordinary catastrophes! you don't hear that very often..
. On this machine yes, but usually I do it on Supernova cluster with many more, I just decided to run a trial calculation for this timestep while Supernova was busy. In fact I didn't want to do 10 000 000 moves, it remainded from some another input which I copied and changed parameters, I'll just kill it when necessary, 10 000 000 would last for years.Mike Towler wrote:(1) can you really only afford 16 cores? That's not normally considered enough to do a 144 electron system... No wonder you want to do 10000000 moves!
.Mike Towler wrote:(2) Blocks are more for your convenience than the computers (except, of course, for block_resetting), and having too many of them will slow down the code considerably, as it has to write to disk at the end of every block.
Ok, I should use less of them, I made that many to get frequent updates of calculation, and also, if I got a population explosion, to restart the calculation from possibly furthest point, not to have to repeat much of it.
- Attachments
-
- graphdmc.png (20.76 KiB) Viewed 16403 times
-
- Posts: 239
- Joined: Thu May 30, 2013 11:03 pm
- Location: Florence
- Contact:
Re: Error: compute_multiplcities
Hi Blazej,
So in some sense, it's catching the population explosion before it starts, because obviously something must have gone wrong if a config wants to take 2 billion copies of itself, and you won't want to let it even begin to try doing that.. (this is why you can't see it in the graphdmc plot). Normal population explosions are detected by imposing a hard limit -- 5 times the target weight for CASINO, if I remember correctly -- on the iteration weight (total population), but it does need to be able to represent the number in memory before it can do that check! You can manually define the factor 5 as dmc_trip_weight in input (usually we recommened it being set to be 2-3 times the target population dmc_target_weight, and if you do this the automatic block resetting is automatically turned on so that it can attempt recovery from the explosion.
I have to say I'm not quite sure why it's allowing the multiplicity to get that big, given there are automatic limiters on the local energy and so on. It might be worth repeating exactly the same calculation with some if(block==45) write statements at the appropriate point..
Use block_time! I need someone to test it for me anyway..
M.
OK - for a given configuration, it moves the electrons, works out the branching factor from the local energies, reference energy, and effective time step. It then computes the multiplicity (the number of copies of this config that will continue to the next iteration) as INT ( random number + branching factor). Before it does the conversion from real to integer type represented by INT, it checks that the result will not be bigger than the largest representable integer on the machine (which for regular 32-bit integers is 2147483647 or something like that). In your case, it is bigger, and that's why it's moaning.Why it didn't throw a standard "population explosion" error, as it usually does
So in some sense, it's catching the population explosion before it starts, because obviously something must have gone wrong if a config wants to take 2 billion copies of itself, and you won't want to let it even begin to try doing that.. (this is why you can't see it in the graphdmc plot). Normal population explosions are detected by imposing a hard limit -- 5 times the target weight for CASINO, if I remember correctly -- on the iteration weight (total population), but it does need to be able to represent the number in memory before it can do that check! You can manually define the factor 5 as dmc_trip_weight in input (usually we recommened it being set to be 2-3 times the target population dmc_target_weight, and if you do this the automatic block resetting is automatically turned on so that it can attempt recovery from the explosion.
I have to say I'm not quite sure why it's allowing the multiplicity to get that big, given there are automatic limiters on the local energy and so on. It might be worth repeating exactly the same calculation with some if(block==45) write statements at the appropriate point..
OK - so why are you not using setting dmc_trip_weight > 0 in input to turn on the automatic protection?Ok, I should use less of them, I made that many to get frequent updates of calculation, and also, if I got a population explosion, to restart the calculation from possibly furthest point, not to have to repeat much of it.
Use block_time! I need someone to test it for me anyway..
In CASINO's defence, at the rate it was going, 10 million moves would have completed in two weeks or thereabouts.10 000 000 would last for years.
M.