$AUXBAS, regarding the auxiliary basis set, whose choice also affects the accuracy of the calculation. The program is enabled for parallel calculation, and is tuned to today's SMP nodes. It is limited to energy calculations only, without any solvent effects, for RHF or UHF references. IAUXBF = 0 uses Cartesian Gaussians = 1 uses spherical harmonics for the auxiliary basis set used to expand the MP2 energy expression into products of 3-index matrices. The default is inherited from ISPHER. The next two control computer resources, trading memory for disk storage. GOSMP = flag requesting shared memory use. The default is .TRUE. in multi-core nodes, but .FALSE. in a uniprocessor. This option means only one copy of certain large matrices is stored per node. USEDM = a flag to store two and three center repulsion integrals in distributed memory (.TRUE.), or in disk files (.FALSE., which is the default). Selection of this flag requires MEMDDI in $SYSTEM. The default is .TRUE. The RI approximation reduces CPU time, memory requirements, and total disk storage requirements compared to exact calculation. Experimentation with these two keywords will let you tune the program to your hardware situation. For example, choosing GOSMP=.TRUE. and USEDM=.TRUE. will run without any extra disk files, while setting GOSMP=.TRUE. and USEDM .FALSE. will minimize memory usage (and network usage) at the expense of doing disk I/O. Total memory usage per node can be obtained by running EXETYP=CHECK. Note the largest replicated memory printed during the RIMP2's output, dividing by 1000000 to get the correct input for MWORDS (round up a bit). Note the largest shared memory requirement printed, also dividing by 100000, and rounding up a bit. Note the distributed memory requirement, which is already in megawords, and is the correct input for MEMDDI. Then, assuming you use p total compute process on multiple n-way nodes, the memory per node is GBytes/node= 8(n*MWORDS + shared + n*MEMDDI/p)/1024 Turning off GOSMP reduces the shared memory to 0 but increases MWORDS, which is multiplied by the number of cores per node! Turning off USEDM leads to MEMDDI=0 by using disk storage instead. If additional memory is available, increasing MWORDS can lead to a reduction in the level of the occupied orbital batch, or "LV". Larger MWORDS permits a smaller LV, which will in turn reduce the required computational time, and the required network traffic or disk I/O. The value of LV used is the last line appearing after "CHECKING SIZE OF OCCUPIED ORBITAL BATCH". The next four control numerical accuracy, but see $AUXBAS which is even more influential in regards the accuracy! OTHAUX = flag to orthogonalize the RI basis set by diagonalization of the overlap matrix. If there is reason to suspect linear dependence may exist in the RI basis, select this option to have a more numerically stable result. Larger RI basis sets such as CCT and ACCT, in particular, may benefit from selecting this. (default=.FALSE.) STOL = threshold at which to remove small overlap matrix eigenvectors, ignored if OTHAUX=.FALSE. This keyword is analogous to QMTTOL in $CONTRL for the true AO basis. (default= 1.0d-6) IVMTD = selects the procedure for removing redundancies when inverting the two-center, two-e- matrix. = 0 use Cholesky decomposition (default) = 2 use diagonalization VTOL = threshold at which to remove redundancies. This is ignored unless IVMTD=2 (default= 1.0d-6) Don't forget to see also the $AUXBAS input group! An example of this program follows. The molecule is taxol, with 1032 AOs and MOs in the 6-31G(d) basis, correlating 164 valence orbitals. The RI basis set used is SVP, which matches the true basis set in quality. There are 4175 AOs in the RI basis. The job was run on a single 8-way node (n=8, p=1,2,4,8), using MWORDS=50 (leading to LV=6), MEMDDI=580, and the largest shared memory needed is 95 million words. The total node memory is thus (8 bytes/word)*(8*50 + 95 + 8*580/ 8)/1024 = 8.4 GBytes easily fitting into a modern 16 GByte node. It reduces to (8 bytes/word)*(8*50 + 95 + 8*580/16)/1024 = 6.1 GB/node if two 8-way nodes are used. Scaling is p SCF RI-MP2 job total 1 7391 7919 15366 2 3718 4131 7860 4 1857 2290 4174 8 952 1488 2479 16 486 758 1276 using two 8-way nodes. numerical results are E(RI-MP2)= -2920.607512 versus the exact E(MP2)= -2920.606231 The 0.0013 error should be measured against the total 2nd order correlation energy, which is -8.7855, while noting the time for the 2nd order E is similar to the SCF time. =========================================================== ===========================================================
generated on 7/7/2017