Hi Dmitry,
Congratulations! You've written the 400th post to these forums..
First some terminology (to some extent you seem to be mixing up basis functions, orbitals, and shells) : 
The input to CASINO from a Gaussian quantum chemistry code is normally given by a set of Hartree-Fock or DFT orbitals expanded in a Gaussian basis set. A Slater determinant of these gives us our first guess at the many-body wave function. The orbitals are given by an expression of the form:  Sum (orbital coefficients * basis functions). In some contexts e.g. solving the Hartree-Fock-Roothaan equations by iterative diagonalization, the orbital coefficients are sometimes referred to as 'eigenvector coefficients'. The conventions in the original specification for the CASINO gwfn.data were the conventions used by the CRYSTAL program. These are not necessarily the best conventions as I was still very young in 1997.. 
 
 
In turn  the basis functions are linear combinations ('contractions') of simple harmonic Gaussian-type functions (GTFs); the coefficients in the linear combination are the 'contraction coefficients', and the (polynomial * exp(-alpha*r^2) functions being combined are referred to as 'primitive Gaussians' or 'primitives'. Don't refer to basis functions as 'basis orbitals' as this just leads to confusion. They're not orbitals, they're just a bunch of Gaussians times polynomials constitutung a basis set. I know chemists often still refer to this sort of thing as the 'Linear Combination of Atomic Orbitals' method - but they don't really mean that, at least since about 1956. 
The information about the basis passed through gwfn.data consists of exponents and contraction coefficients which specify the radial dependence of each GTF. The angular dependence -- effectively what polynomial to use -- is passed through via a label specifying the angular momentum of each shell as 
s, 
p, 
d, 
f.. corresponding to 
l=0,1,2,3.. etc. The definition of 'shell' just means 'the set of basis functions of different m for a given 
l'. i.e. for an 
f function there is a 'shell' of 7 different basis functions representing the 7 different possible 
m values when 
l=4. Here's the reference expressions for the more difficult cases from 
d onwards:
Code: Select all
Expressions for the solid harmonics:
The CASINO orbital evaluators were originally written to take input from CRYSTAL, and we therefore use the conventions of that program.
For reference, the second column of the following tables shows the forms of the real solid harmonics as assumed by CASINO. These are taken from p. 170 of the 'orange book' of the CRYSTAL program 'HF ab initio treatment of crystalline systems' by C. Pisani, R. Dovesi, and C. Roetti 1988, which is freely downloadable from Springer; the relevant appendix is available from my website at :
http://www.tcm.phy.cam.ac.uk/~mdt26/crystal_solid_harmonics.pdf
The third column below shows these formulae as printed in the current CRYSTAL manual, which randomly include or omit the numerical factors (the conversion factor between the two columns is given as O-->C below.  At least one entry (the d 2,2) seems to be incorrect in the CRYSTAL manual, in the sense that it does not differ from the Orange book version by a simple multiplicative factor.
d functions
---------------
l,m     Orange book (and CASINO)       CRYSTAL manual                O-->C
2,0     0.5*(3z^2 - r^2)               3z^2 - r^2                    * 2
2,1     3xz                            3xz
2,-1    3yz                            3yz
2,2     3(x^2-y^2)                     3(x^2+y^2)                    INCORRECT?
2,-2    6xy                            3xy                           * 0.5
f functions
-----------
l,m     Orange book (and CASINO)       CRYSTAL manual                O-->C
3,0     z^3 - 3/2 * (x^2z + y^2z)      2z^3 - 3 * (x^2z + y^2z)      * 2
3,1     6 * xz^2 - 3/2 * (x^3 + xy^2)  4xz^2 - x^3 - xy^2            * 2/3
3,-1    6 * yz^2 - 3/2 * (x^2y + y^3)  4yz^2 - x^2y - y^3            * 2/3
3,2     15 * (x^2z - y^2z)             x^2z - y^2z                   * 1/15
3,-2    30 * xyz                       xyz                           * 1/30
3,3     15 * x^3 - 45 * xy^2           x^3 - 3xy^2                   * 1/15
3,-3    45 * x^2y - 15 * y^3           3x^2y - y^3                   * 1/15
g functions (not actually implemented in CRYSTAL)
-----------
l,m     Orange book (and CASINO)
4,4     105 * x^4 - 630 * x^2y^2 + 105 * y^4
4,3     105 * x^3z - 315 * xy^2z
4,2     -15/2 * x^4 + 45 * x^2z^2 + 15/2 * y^4 - 45 * y^2z^2
4,1     -15/2 * x^3z - 15/2  * xy^2z + 10 * xz^3
4,0     3/8 * x^4 + 3/4 * x^2y^2 - 3 * x^2z^2 + 3/8 * y^4 - 3 * y^2z^2 + z^4
4,-1    -15/2 * x^2yz - 15/2 * y^3z + 10 * yz^3
4,-2    -15 * x^3y - 15 * xy^3 + 90 xyz^2
4,-3    315 * x^2yz - 105 * y^3z
4,-4    420 * x^3y - 420 * xy^3
We do not need to write down here expressions for the first and second derivatives since CASINO will do a numerical check that its analytic expressions are the correct derivatives of the formulae above (irrespective of whether or not the above formulae are correct). To do this check, set the 'wfcheck' keyword in input to be 'T'.
One historical CASINO inconsistency which may be easily overlooked:
Constant numerical factors in the real solid harmonics e.g. the '3' in the 3xy d function, or '15' in the (15x^3-45^xy2) f function, may be premultiplied into the orbital coefficients so that CASINO doesn't have to e.g. multiply by 3 every time it evaluates that particular d function. In practice the CASINO orbital evaluators do this only for d functions, but *not for f and g* (this may or may not be changed in the future if it can be done in a backwards-consistent way)
Your first question: "
I am trying to understand why the number of items in EIGENVECTOR COEFFICIENTS section grows as number of basis orbitals *squared*?"
I describe the conventions of the CRYSTAL HF program. The Hartree-Fock equations are integro-differential equations. You make them solvable a la Roothaan by introducing a finite basis set of 
M functions and converting the HF equations into matrix equations involving MxM square matrices i.e. 
FC=
eSC . Here 
F is the Fock operator, 
C the matrix of eigenvectors, 
S the overlap matrix (which appears because the Gaussians aren't orthogonal), and 
e the diagonal matrix of orbital energies. Note this procedure gives you 
M eigenvectors (defining 
M orbitals) from 
M basis functions, and that 
M may be considerably more than the number of electrons 
N that occupy the lowest energy orbitals. Having got rid of the overlap matrix by a suitable transformation, once could then apparently solve the equations just by diagonalizing the Fock matrix, except that the definition of the Fock operator requires knowledge of the 
N lowest energy orbitals i.e. the things that you're trying to calculate. Hence the need for an iterative self-consistent field procedure involving repeated diagonalization.
So the answer to your question is that having 
M (=NUM_AO in your example) basis functions gets you 
M=NUM_AO orbitals, the lowest 
N of which are 'occupied orbitals' and the other 
M-N of which are 'virtual orbitals'. If you only want to calculate the ground state, then yes - you could get CRYSTAL to write only the coefficients for the occupied orbitals into gwfn.data. However, a priori you don't know whether the user wants to calculate an excited state, or to form a wave function from a linear combination of a small number of determinants involving different virtual orbitals (e.g. in cases where 'static correlation' is important). Hence, the best convention is to write out all of the available orbital information into the gwfn.data file, so the user never has to do the HF calculation more than once.
I note finally that the ancient file that you've found buried in some dark recess of the distribution seems to not be correct (the 'shell' and 'band' indices should be reversed.. I will have it destroyed). Note that in periodic systems, the definition of an 'orbital' involves a specification of the band and the k point. In a molecule, there is only 1 k point, and 'band' corresponds to 'orbital sequence number'. The modern, correct definition for CK is given in the manual in the specification for gwfn.data as folllows (for periodic systems it distinguishes between k points where the orbital coefficients must be real, and those where they are necessarily complex):
Code: Select all
  ORBITAL COEFFICIENTS
  --------------------
  & CK(NUM_REAL_K*NUM_AO*NUM_AO + NUM_COMPLEX_K*NUM_AO*NUM_AO)
    block as follows
     ----spin (spin-polarized calcs only)---------------------------------
    /                                                                     \
     ----k (for solids)-------------------------------
    /                                                 \
     ----bands-(solids)_or MOs (molecules)------
    /                                           \
     -AO-basis functions grouped by shell-
    /                                     \
    Complex coefficients are given as 2 adjacent real numbers (real,imaginary).
    Ordering of d orbital coefficients;
    z2, xz, yz, x2-y2, xy
    Ordering of f,g,..: m=0,1,-1,2,-2,3,-3,4,-4.....
Next question: "
2. Is it important to provide the normalized wave function for casino? If it will not be properly normalized (by my mistake or intentionally), which simulation results may be wrong?"
You don't need to normalize the actual trial many-body wave function (which is lucky when you start e.g. multiplying the determinant by Jastrow factors..). The normalization factor will cancel out in working out the ratio of the old and new wavefunctions, as you correctly observe.
Now the orbitals need to be correctly normalized internally to CRYSTAL of course, not least so that overlap integrals between them have the correct meaning, and that means that as written out they should be normalized. But as far as CASINO is concerned, if some single column of the determinant is multiplied by a factor, then that doesn't matter as that factor can be taken outside of the determinant and get cancelled..
That said, one needs to very aware of the conventions for doing this used by other codes, since the way things are arranged and stored are often very peculiar (in the sense that things written to their output files or stored internally are often not what they seem). CRYSTAL itself provides a very good example of this, and this should answer your last question about normalizing contractions:
Code: Select all
Recall that e.g. an f contraction in fact represents a 'shell' of 7 different basis functions representing the 7 different possible m values when l=4. The 'm-dependence' comes from the multiplication of each primitive Gaussian by the solid harmonic for that m (see the polynomials in the tables given above).
Note that this means that stuff which is stored in an array indexed by the shell (such as the contraction coefficients) cannot contain any m-dependence.
One must normalize both the individual primitives, and the whole contraction so that overlap integrals come out right.
The normalization constant N_cont for the whole contraction (in which the contraction coefficient for the ith primitive is d_i, and the exponent is a_i) is given by:
                                        1
N_cont = ---------------------------------------------------------
                                 2 * root (a_i * a_j )
         sqrt [ sum_ij d_i d_j ( --------------------- )^(l+3/2) ]
                                       a_i + a_j
This is clearly independent of m.
Next the normalization constant N_prim for the individual primitives (split into two parts, with all the m-dependence in the second one):
         root[2^(l+3/2) * alpha^(l+3/2)]            2^l
N_prim = -------------------------------  *  root --------  (m-independent)
                pi^(3/4)                          (2l-1)!!
          (2 - delta_m,0) * (l - |m|)!
*  root  ------------------------------                     (m-dependent)
                (l + |m|)!
where delta_m,0 is the Kronecker delta.
In the API for the CRYSTAL program (activating a keyword will 'write out the wave function etc. in a standard format) we find that N_cont and the first part of N_prim are multiplied into the Gaussian contraction coefficients, but the m-dependent part - which could in principle be multiplied into the orbital coefficients, is simply omitted. (Our CASINO conversion utilities have to manually add the m-dependent normalization factor before writing gwfn.data). In addition all the contraction coefficients are multiplied by pi^(5/8) * 2^(1/4), as this is useful in computing overlap integrals; as far as I can tell without looking at the source code, the orbital coefficients must be multiplied by the reciprocal of this. You can see how complicated things can get..
The m-dependent factors for the different shells (which must be added manually by the CRYSTAL-->CASINO converter) are:
d functions
-----------
l,m
---
2,0   root 1*2/2  = 1
2,1   root 2*1/6  = 1/root(3)
2,-1              = 1/root(3)
2,2   root 2*1/24 = 1/root(12) = 1/(2*root3)
2,-2              = 1/root(12) = 1/(2*root3)
f_functions
-----------
l,m
---
3,0   root 1*6/6   = 1
3,1   root 2*2/24  = 1/root(6)
3,-1               = 1/root(6)
3,2   root 2*1/120 = 1/root(60)
3,-2               = 1/root(60)
3,3   root 2*1/720 = 1/root(360)
3,-3               = 1/root(360)
g functions
-----------
4,0   root 1*24/24   = 1
4,1   root 2*6/120   = 1/root(10)
4,-1                 = 1/root(10)
4,2   root 2*2/720   = 1/root(180)
4,-2                 = 1/root(180)
4,3   root 2*1/5040  = 1/root(2520)
4,-3                 = 1/root(2520)
4,4   root 2*1/40320 = 1/root(20160)
4,-4                 = 1/root(20160)
Hmmm..
Hope that helps.
M.