$AUXBAS, regarding the auxiliary basis set, whose choice                        
also affects the accuracy of the calculation.                                   
   The program is enabled for parallel calculation, and is                      
tuned to today's SMP nodes.  It is limited to energy                            
calculations only, without any solvent effects, for RHF or                      
UHF references.                                                                 
IAUXBF = 0 uses Cartesian Gaussians                                             
       = 1 uses spherical harmonics                                             
           for the auxiliary basis set used to expand the                       
           MP2 energy expression into products of 3-index                       
           matrices.  The default is inherited from ISPHER.                     
The next two control computer resources, trading memory for                     
disk storage.                                                                   
GOSMP  = flag requesting shared memory use.  The default                        
         is .TRUE. in multi-core nodes, but .FALSE. in a                        
         uniprocessor.  This option means only one copy of                      
         certain large matrices is stored per node.                             
USEDM  = a flag to store two and three center repulsion                         
         integrals in distributed memory (.TRUE.), or in                        
         disk files (.FALSE., which is the default).                            
         Selection of this flag requires MEMDDI in $SYSTEM.                     
         The default is .TRUE.                                                  
The RI approximation reduces CPU time, memory requirements,                     
and total disk storage requirements compared to exact                           
calculation.  Experimentation with these two keywords will                      
let you tune the program to your hardware situation.  For                       
example, choosing GOSMP=.TRUE. and USEDM=.TRUE. will run                        
without any extra disk files, while setting GOSMP=.TRUE.                        
and USEDM .FALSE. will minimize memory usage (and network                       
usage) at the expense of doing disk I/O.                                        
Total memory usage per node can be obtained by running                          
EXETYP=CHECK.  Note the largest replicated memory printed                       
during the RIMP2's output, dividing by 1000000 to get the                       
correct input for MWORDS (round up a bit).  Note the                            
largest shared memory requirement printed, also dividing by                     
100000, and rounding up a bit.  Note the distributed memory                     
requirement, which is already in megawords, and is the                          
correct input for MEMDDI.  Then, assuming you use p total                       
compute process on multiple n-way nodes, the memory per                         
node is                                                                         
   GBytes/node= 8(n*MWORDS + shared + n*MEMDDI/p)/1024                          
Turning off GOSMP reduces the shared memory to 0 but                            
increases MWORDS, which is multiplied by the number of                          
cores per node!  Turning off USEDM leads to MEMDDI=0 by                         
using disk storage instead.                                                     
If additional memory is available, increasing MWORDS can                        
lead to a reduction in the level of the occupied orbital                        
batch, or "LV".  Larger MWORDS permits a smaller LV, which                      
will in turn reduce the required computational time, and                        
the required network traffic or disk I/O.  The value of LV                      
used is the last line appearing after "CHECKING SIZE OF                         
OCCUPIED ORBITAL BATCH".                                                        
The next four control numerical accuracy, but see $AUXBAS                       
which is even more influential in regards the accuracy!                         
OTHAUX = flag to orthogonalize the RI basis set by                              
         diagonalization of the overlap matrix.  If there                       
         is reason to suspect linear dependence may exist                       
         in the RI basis, select this option to have a                          
         more numerically stable result.  Larger RI basis                       
         sets such as CCT and ACCT, in particular, may                          
         benefit from selecting this.  (default=.FALSE.)                        
STOL   = threshold at which to remove small overlap matrix                      
         eigenvectors, ignored if OTHAUX=.FALSE.  This                          
         keyword is analogous to QMTTOL in $CONTRL for the                      
         true AO basis.  (default= 1.0d-6)                                      
IVMTD  = selects the procedure for removing redundancies                        
         when inverting the two-center, two-e- matrix.                          
       = 0 use Cholesky decomposition (default)                                 
       = 2 use diagonalization                                                  
VTOL   = threshold at which to remove redundancies.  This                       
         is ignored unless IVMTD=2  (default= 1.0d-6)                           
Don't forget to see also the $AUXBAS input group!                               
An example of this program follows.  The molecule is taxol,                     
with 1032 AOs and MOs in the 6-31G(d) basis, correlating                        
164 valence orbitals.  The RI basis set used is SVP, which                      
matches the true basis set in quality.  There are 4175 AOs                      
in the RI basis.  The job was run on a single 8-way node                        
(n=8, p=1,2,4,8), using MWORDS=50 (leading to LV=6),                            
MEMDDI=580, and the largest shared memory needed is 95                          
million words.  The total node memory is thus                                   
  (8 bytes/word)*(8*50 + 95 + 8*580/ 8)/1024 = 8.4 GBytes                       
easily fitting into a modern 16 GByte node.  It reduces to                      
  (8 bytes/word)*(8*50 + 95 + 8*580/16)/1024 = 6.1 GB/node                      
if two 8-way nodes are used.  Scaling is                                        
   p   SCF   RI-MP2  job total                                                  
   1   7391    7919   15366                                                     
   2   3718    4131    7860                                                     
   4   1857    2290    4174                                                     
   8    952    1488    2479                                                     
  16    486     758    1276 using two 8-way nodes.                              
numerical results are E(RI-MP2)= -2920.607512                                   
        versus the exact E(MP2)= -2920.606231                                   
The 0.0013 error should be measured against the total 2nd                       
order correlation energy, which is -8.7855, while noting                        
the time for the 2nd order E is similar to the SCF time.                        

generated on 7/7/2017