14 Parallel SAPT: psapt2k2

The parallel implementation of SAPT has been described in detail in Ref. 9. The parallel psapt2k2 codes have essentially the same functionality as their sequential counterpart sapt2002. Slight differences exist in user interface-related areas, such as compilation (more libraries are needed in psapt2k2), input options (sapt2002 is more user-friendly when it comes to input defaults), or output presentation. Since psapt2k2 parallel runs are more prone to various types of problems, the output of psapt2k2 contains a lot of debugging and profiling information. sapt2012 has been tested with a large number of (sequential) SCF front-end programs, while in the case of psapt2k2 the parallel gamess(us) is effectively the only SCF code which can be used. psapt2k2 should also work with the sequential SCF codes, like atmol1024, although quite often the time cost of such a calculation would be dominated by the SCF step. As stated above, psapt2k2 is essentially the parallel version of sapt2002. Thus, the extensions introduced in later versions of SAPT codes are not available for parallel execution.

All modules of the psapt2k2 code have been parallelized using only the MPI library and thus the suite is very portable and should run on any parallel architecture. It is also recommended that the program is linked with the scalapack/blacs libraries, as these are needed for the CHF routine to run in parallel. In the case these libraries are not available, the CHF routines will run in sequential mode, which may have substantial impact on the timings, especially for larger numbers of occupied orbitals. The scalapack/blacs libraries are also necessary to build the code generating the static and dynamic CHF propagators, also referred to as susceptibility functions. This code will not be built if the libraries are missing.

To date, the program has been tested on the SGI Origin2000 and Origin3800 shared-memory machines, the IBM SP3 and SP4 platforms (shared/distributed memory), as well as on Beowulf clusters running Linux. From the standpoint of that latter machine, it is important that psapt2k2 is able to distribute its temporary files over file-systems local to the nodes, so that the scratch file-systems do not have to be NFS-mounted across the whole cluster.

Novel algorithms have also been developed for efficient calculation of the electrostatic, induction, and dispersion energies, see Sec. 14.7. The idea behind these algorithms is that all these interaction components are expressible through monomer charge densities and dynamic susceptibility functions. These monomer properties can be calculated just once for each monomer at a high level of correlation, then simplified, stored, and reused many times for different dimer geometries. Scalable beta versions of these new algorithms are also included in this distribution. As mentioned before, the induction/dispersion module requires the SCALAPACK/BLACS libraries to be installed.

14.1 Structure of ./psapt2K2 directory

After unpacking, the ./psapt2K2 main directory will contain the following files and subdirectories:

14.2 Installing psapt2k2

Installation of psapt2k2 is controlled by the ksh script compall, a small portion of which has to be customized by the user. The compall script sets the compilation options appropriate for a given hardware platform. It is assumed that the integral/SCF code used is gamess(us). No other SCF programs are currently supported by the compall script. The use of other SCF packages is still possible although it would require some hand-tuning of the compilation process. After the compilation options have been set, the compall script compiles and links all the pieces of the code. Finally, the scripts used to run psapt2k2 are updated by inserting the current paths to the executables.

14.2.1 compall installation script

The following environment variables must be adjusted by the user in the top section of the compall script:

  1. Execution shell: Change the first line of the script to one of the following:
    • #!/bin/bash : The Bash shell on a LINUX box, or
    • #!/bin/ksh : The Korn shell on all other platforms
  2. TARGET: the machine the program is being compiled on. Choose one from o2k (will work for O3K also), sp (will work for SP3 and SP4), or mpich (on Beowulf clusters).
  3. BLAS: if TARGET=mpich, specify how you want the BLAS library to be linked (usually this is just -lblas, but Beowulf configurations differ, so beware!). For other TARGETs – ignore this option.
  4. GAMESS: path containing the gamess exec you will be using.
  5. VERNO: gamess “version number” (compall assumes that the name of gamess executable is gamess.$VERNO.x).
  6. TRGT_GMS: if TARGET=o2k, specify how gamess should be run (pick from sockets and sgi-mpi, consistently with your gamess installation). For other TARGETs this will be set automatically.
  7. POE_COM: if TARGET=sp, specify which poe command (poe, grd_poe, or sge_mpirun) will be used to run parallel codes. Usually an MPI code on an SP machine is launched by the Load Leveler queuing system using the poe command. At ARL, a wrapper sge_mpirun is used instead, which works with the GRD queuing system (or grid engine). On IBM SP3 brainerd at ARL, the command grd_poe also works. For TARGETs other than SP – ignore this variable.
  8. SCLPCK: specify how the scalapack and blacs libraries are to be linked. If this is set to ’NO’, then the sequential CHF procedure will be used (will spoil scaling of psapt for large systems), and the pcksdisp code (for susceptibility functions) will not be built at all.
  9. LPCKLIB: specify how the Lapack library is to be linked. This library is needed to build the program tdenfit, fitting the densities and propagators in terms of auxiliary bases. If LPCKLIB is set to ’NO’, tdenfit will not be built. On SP machines, lapack is a part of ESSL library and hence LPCKLIB does not have to be given at all (just set it to YES).

Once all the variable mentioned above are set, simply type:

and the compilation should begin. Check compall.log to see if all is well. The compall script works by invoking the make command with platform-specific makefiles. Thus, any subsequent invocation of compall will detect changes made to the sources and only those parts of the code will be rebuilt which were affected by these changes. Running the script ./psapt2K2/cleandirs will restore the ./psapt2K2 directory to its “distribution” state, i.e., all object files and executables (except shell scripts) will be deleted and a subsequent invocation of compall will start the compilation from the beginning.

14.2.2 Testing psapt2k2 installation

Once the compilation has been completed successfully (if unsure, just grep the compall.log file for the word error), we recommend that you perform as many tests as possible before starting to use psapt2k2 for production runs. A suite of test jobs of varying size and for different parallel platforms has been provided for this purpose in the subdirectory examples/. Sample outputs from different machines (*out_ref.*) and queuing system submission scripts can be also found there. Examples of using the new, monomer property-based algorithms for electrostatics, induction, and dispersion can be found in directories ./psapt2K2/examples/PLATFORM/EDI, where PLATFORM is one of O3K, SP3, SP4, and BEOWULF.

14.3 Using psapt2k2 with gamess as a front-end

gamess(us) is currently the only parallel SCF package psapt2k2 is interfaced with. See Sec. 9 for information about setting up the gamess input files for psapt2k2 runs and about the structure of the gamess driver script runGAMESS and of the GAMESS-ptran interface program gamsintf.

There are a few minor differences between the runGAMESS script used with the sequential program sapt2012 (described in Sec. 9 ) and the one used with psapt2k2. Although these differences are completely transparent to the user, we list them here for the sake of completeness:

14.4 How to run psapt2k2

To perform a psapt2k2 calculation for one dimer geometry, one has to run a dozen or so programs: integral/SCF calculations for the monomers and possibly for the dimer, interface programs (in most cases) rewriting integral/SCF files into different forms, 1- and 2-electron integral transformations, MBPT/CCSD calculations for monomers, and finally the “proper” SAPT calculations. All of this is performed automatically using the script pSAPT.X from the ./psapt2K2/bin directory, where X denoting a platform is one of o2k, sp, mpich, mpich.maui, and mpich.maui.sep. The last two cases pertain to the huinalu cluster at Maui center. While pSAPT.mpich.maui controls a calculation utilizing the common scratch file-system /scratch/huinalu, the other script, pSAPT.mpich.maui.sep, makes use of local scratch disk (/scratch/local) on every node. The pSAPT.X script is executed on one “master” processor which in turn calls other executables and scripts, also found in the ./psapt2K2/bin directory. The MPI executables (such as ptran, pcc, psapt) are launched by the “master” using the mpirun command (on SP systems poe, grd_poe, or sge_mpirun is used for this purpose). The details of the calculation “flowchart”, input files, and the use of various types of basis sets are presented in Sec. 10. Here we only expose the features characteristic of the parallel version of the code.

The executables invoked by the pSAPT.X script communicate with one another through various (usually large) temporary files. Most of these files are written and read by only one process and do not need to be accessed directly by other processes. This means that such files can be stored locally in some scratch directory accessible only to one process. This is indeed the case on a Beowulf cluster, where scratch file-systems are usually local to the nodes. Another class of (smaller) temporary and input files have to be accessed from all processes. On a Beowulf cluster such files have to be replicated in each processes’ local scratch area. This problem does not exist on most other multiprocessor platforms (like the SGIs and SPs in the DoD centers), where all temporary files are stored in a common large file-system accessible to all processes. All the details of how the files are handled are taken care of by the pSAPT.X scripts and the MPI executables themselves and are completely transparent to the user (with the exception of the input files - vide infra).

During the compilation, the script compall adapts the appropriate pSAPT.X script by inserting the proper global paths to executables and should run properly on any installation without changes. Currently, this feature is not functional in the case of scripts used on huinalu, namely pSAPT.mpich.maui and pSAPT.mpich.maui.sep. The paths in these scripts need to be changed manually so that the variables MAIN_SAPT_DIR and SCF_HOME_DIRECTORIES reflect user’s directory structure (if gamess is used, it concerns also additional directories relevant for this program). These variables are set about 500 lines down in the pSAPT.X script. The pSAPT.X script is written in ksh, although on Linux platforms, some of which are not equipped with ksh, it is actually executed under bash.

14.4.1 Running psapt2k2 on SGI

We first consider a situation where there is no queuing system installed and a psapt2k2 job can be submitted interactively from the command line.

Before launching a psapt2k2 run, the scratch directory should be created and all the input files should be copied to this directory. The psapt2k2 calculation is then launched in this scratch directory by typing pSAPT.o2k with appropriate options. If ksh is used as the login shell and an SGI machine is the platform, one would type

pSAPT.o2k jobname opt1 nproc opt2 >output.file 2>&1 &

(if psapt2K2/bin is not in your $PATH then the full path to pSAPT.o2k will have to be given.) The keyword jobname has to be the same as the beginning of the name of the input files for the system considered (we use a naming scheme to reference the needed input files). For example, let the keyword be name. Then, as the script is running, it will look for the nameP.data file, which contains the input data to the programs ptran, pcc, and psapt. Similarly, the pSAPT.o2k script will look for gamess input files to the integral/SCF parts of a calculation starting with name. The number and full names of such files depend on the type of basis set used in calculations. See Sec. 10 for details on how to prepare gamess input files for different types of runs. Using the string scfcp for opt1 will request, in addition to the standard SAPT calculation, also the CP-corrected supermolecular SCF interaction energy in the dimer-centered basis set. If such a calculation is not required, use noscfcp instead. Using the keyword gototran as opt1 will result in restarting a pSAPT.o2k run from the transformation step, i.e., from the program ptran. The parameter nproc is the number of processors to launch the job on. The parameter opt2 must be the full path of the scratch directory in which the whole calculation is taking place.

The output from all programs executed by pSAPT.o2k is written to a file output.file located in the scratch directory (this name can be set by the user to an arbitrary string, usually connected with the system being considered and its geometry). The last two elements in the command line given above are Unix ksh constructs which indicate that standard error messages will be written to the same file as standard output and that the processes will be run in the background.

A more common situation occurs when jobs have to be submitted through some sort of a queuing system, such as GRD (or, more recently, the grid engine). In this case, all the operations described above should be performed by a script submitted to the queue. A sample script, ARL_script, could look like

#$ -S /bin/ksh  
#$ -N ArHF_Tc  
#$ -l o3k  
#$ -pe pe_4hr 8  
#        Set the these parameters:  
NODT=8                          # number of processors to run everything  
CURDIR=/home/bukowski/ArHF_Tc   # Starting directory  
WRKROOT=/usr/var/tmp/bukowski   # root of the scratch directory  
SAPTSCRIPT=/home/bukowski/psapt2K2/bin/pSAPT.o2k # driver script  
JOB=ArHF                        # Core of the job name  
# No need to modify anything below this line  
WRKDIR=$WRKROOT/$JOB   # scratch directory where  
                       # the calculation will take place  
cd $CURDIR  
# Create the job-specific scratch directory  
if [ ! -d $WRKDIR ]  
  mkdir $WRKDIR  
# Copy the input files from HOME to scratch  
cp $JOB’.inp’ $WRKDIR  
cp $JOB’A.inp’ $WRKDIR  
cp $JOB’B.inp’ $WRKDIR  
cp $JOB’P.data’ $WRKDIR  
# Run the job in scratch  
cd $WRKDIR  
$SAPTSCRIPT $JOB scfcp $NODT $WRKDIR >>$JOB’.out_test.’$NODT 2>&1  
# Copy output file back to HOME  
cp $JOB’.out_test.’$NODT $CURDIR

The first few lines starting with #$ are options passed on to the queuing system (in this case: GRD), specifying the name of the job, what machine it should be run on, in what queue, and on how many processors. The syntax of these options depends of the queuing system at hand and so these few lines must be adjusted accordingly. The script is submitted typically using the command

qsub ARL_script

After the script enters execution, it first creates a scratch directory named after the name of the job, then copies the input files from the home subdirectory (CURDIR) to the scratch directory, and runs the pSAPT.o2k script (SAPTSCRIPT). Upon completion of the psapt2k2 calculation, the output file is copied back to the home subdirectory CURDIR. Note that after the job is finished, the scratch directory will contain all kinds of temporary files which are kept there in case a restart is needed. These files should be manually cleaned up.

14.4.2 Running psapt2k2 on an SP3/SP4

The job submission technique here is very similar to the one used on an SGI platform, except that interactive submissions are generally not possible and everything must be handled by a queuing system. Consult the ARL_script and NAVO_script in the  psapt2K2/examples/SP? directories for examples of the GRD and Load Leveler submission scripts.

14.4.3 Running psapt2k2 on a Beowulf cluster

Configurations of Linux-based Beowulf clusters may vary between installations. Presented below are techniques of running psapt2k2 on our local cluster samson at University of Delaware, and on the huinalu cluster at Maui supercomputer center.

samson cluster

The machine consists of 132 compute nodes equipped with 1 GHz AMD Athlon processors and connected by Ethernet. Each node is equipped with a local scratch file-system not accessible from other nodes. On each node, this file-system is mounted as /tmp. In addition, there is a login node which hosts home directories of the users and some shared resources, like the MPI implementation mpich. The home and shared directories are NFS-mounted on all compute nodes. Commands and programs can be executed on compute nodes from the login node through the rsh utility. It is also possible to rlogin to the compute nodes, although it is not necessary for running psapt2k2. The consecutive modules of psapt2k2 are launched from within the script pSAPT.mpich using the mpirun command, while gamess is started using the ddikick.x utility of the ddi package.

It is a good rule on samson that jobs are to be run on compute nodes only and not on the login node. For any MPI application, this may be accomplished by executing the mpirun command on the login node with the option -nolocal, supported by mpich. Unfortunately, this option (or a similar one) is not available in the ddi package which is the gamess parallelization tool. This creates a problem if an MPI application depends on files generated by gamess and both are to be run from the same driver script, such as pSAPT.mpich. This means that pSAPT.mpich has to be submitted directly on one of the compute nodes rather than on the login node, and that the MPI programs should be launched without the -nonlocal option.

The process of submitting a psapt2k2 run on samson consists of the following steps:

  1. In your $HOME, create a subdirectory for the job and copy the input files job*.inp and jobP.data to this subdirectory.
  2. Create file calcnodes listing the nodes about to be used in the calculation, for example,

    Make sure there are no blank lines in this file.

  3. Make sure that the appropriate scratch directory (we will use /tmp/bukowski as an example) is present on each of the compute nodes listed in the file calcnodes.
  4. Submit the job using a command ./submit JOB, where JOB is the job name (the core of the input file name) and submit is a script similar to the following:
########## Customize scratch directory on compute nodes  
########## and the directory where the driver script resides.  
SCRDIR=/tmp/bukowski      # scratch directory  
                        # SAPT driver script  
####### Paths to the SAPT executable directory  
########## End of customized part ######################################  
#### Create all the needed PROC files based on calcnodes  
FRSTNODE=‘head -1 calcnodes‘  
echo $FRSTNODE 0 > PROC  
echo $FRSTNODE 0 > PROCc  
echo $FRSTNODE 0 > PROCe  
echo $FRSTNODE 0 > PROCi  
echo $FRSTNODE 0 > PROCs  
echo $FRSTNODE 0 > PROCso  
for i in ‘tail +2 calcnodes‘  
echo $i 1 $PTRAN >> PROC  
echo $i 1 $PCC   >> PROCc  
echo $i 1 $EVEN  >> PROCe  
echo $i 1 $INT   >> PROCi  
echo $i 1 $PSAPT >> PROCs  
echo $i 1 $SORT  >> PROCso  
########### PROCs file ready - run the job now ############################  
JOB=$1                  # job name  
# copy all SAPT input files and the PROC file onto the scratch dir  
# on every node specified in PROC file  
for i in ‘awk ’{print $1}’ PROC‘  
echo Copying input to $i  
 rcp *inp $i:$SCRDIR  
 rcp $JOB’P.data’ $i:$SCRDIR  
 rcp  PROC $i:$SCRDIR  
 rcp  PROCs $i:$SCRDIR  
 rcp  PROCi $i:$SCRDIR  
 rcp  PROCc $i:$SCRDIR  
 rcp  PROCso $i:$SCRDIR  
 rcp  PROCe $i:$SCRDIR  
# Set the number of nodes  
NPROC=‘wc PROC | awk ’{print $1}’‘  
# Determine the master node (the first one in PROC)  
MASTER=‘head -1 PROC | awk ’{print $1}’‘  
echo Submitting job on $MASTER  
# Submit the job on master node. Make sure that master knows what  
# LOGNAME is (needed for psapt).  
rsh $MASTER "LOGNAME=bukowski; export LOGNAME; echo $LOGNAME; cd $SCRDIR;  
          $SAPT_SCRIPT $JOB scfcp $NPROC $SCRDIR > OUT 2>&1 " &  

The variables SCRDIR (the scratch directory, its name assumed to be the same on each node), SAPT_SCRIPT (full path to the driver script), and SAPTDIR (location of psapt2k2 executables) have to be customized by the user. Using the node information from calcnodes, the script first creates the files PROC, PROCc, PROCe, PROCi, PROCs, and PROCso which will be used by mpich to run the consecutive modules of psapt2k2. The input files and the PROC* files are then copied from the home subdirectory to the scratch directory on each node. Based on analyzing the PROC file, the script detects the number of processors and the “master” node on which pSAPT.mpich will be submitted (this is the first node listed in calcnodes). Finally, having set up some environment parameters, it launches pSAPT.mpich on the “master” node. The temporary files produced by gamess and psapt2k2 will be created and stored in (local) directories /tmp/bukowski. The output from the run will be written to the file OUT in the scratch directory (/tmp/bukowski) of the “master” node (in this case: sam2) only. It will have to be copied (by rcp, for example) back to the home subdirectory.

huinalu cluster

The machine consists of 256 batch nodes and 4 interactive (or login) nodes, each equipped with two 933 MHz Pentium III processors and 1 GB of memory. The nodes are connected via a 200 MB/s Myrinet Switch and also via 100 Mbit/s Ethernet. Each node is equipped with a local scratch file-system not accessible from other nodes. On each node, this filesystem is mounted as /scratch/local. In addition to the local scratch, all nodes have access to a shared scratch area, mounted as /scratch/huinalu, with the capacity of 239 GB. All nodes have also access to users’ home directories and other shared resources. The machine features the mpich implementation of MPI and a variety of Fortran compilers: g77, Intel, and Portland, all capable of using both the Myrinet and the Ethernet connectivity. Special care must be taken at the time of compilation and execution so that the PATH variable points to the proper libraries and the mpirun command appropriate for a given connection/compiler combination. Commands and programs can be executed on compute nodes from the login node through the ssh utility. It is also possible to ssh-login to the compute nodes, although it is not necessary for running psapt2k2. The consecutive modules of psapt2k2 are launched from within the script pSAPT.mpich.maui or pSAPT.mpich.maui.sep (described in the following) using the appropriate mpirun command, while gamess is started using the ddikick.x utility of the ddi package. The default source of ddikick.x, ddikick.c, has to be modified to use ssh instead of rsh.

In the debugging stage, psapt2k2 jobs may be submitted on the four interactive nodes hnfe01hnfe04 (i.e., on up to 8 processors) using a technique similar to that described above for samson. The only practical difference is that the invocation of the mpirun command utilizes the -machinefile option, which for some reason is not working on samson. Consequently, the PROC file is only needed by gamess while the psapt2k2 modules use a file called calcnodes in a format similar to


where :2 means that two processes will be run on each of the three interactive (2-processor) nodes, a total of 6 processes. In this case, a PROC file consistent with calcnodes, needed to inform gamess where to run, will have the form


The driver script to be called by submit is now pSAPT.mpich.maui which differs from samson’s pSAPT.mpich in the syntax of mpirun command as well as in some file management portions (it is assumed here that the scratch directory is set to the common filesystem /scratch/huinalu). In order to run gamess, pSAPT.mpich.maui calls a custom script runGMS.mpich.maui which differs from runGMS.mpich in details of how the ddi’s HOSTLIST is constructed and in the use of ssh instead of rsh for internode communication. Jobs on interactive nodes can only be run using the Ethernet network interface.

In the production stage, psapt2k2 jobs have to be run on compute nodes. The legal way to accomplish this is to use the huinalu’s queuing system called “Maui Scheduler”. The scheduler requires the user to prepare two scripts. The first is the scheduler command script which in turn calls the second script performing the file management tasks and invoking the psapt2k2 driver script. The format of the scheduler commands script MAUI_script is as follows:

# Initial working directory and wallclock time  
IWD == "/u/bukowski/tests/ArHF_Tc"  
# Job wall-time limit in seconds  
WCLimit == 1800  
# Task geometry  
Tasks == 8  
Nodes == 8  
TaskPerNode == 1  
# Feature requests  
Arch == x86  
OS == Linux  
# Account  
Account == "AFPRD-0102-001"  
# MPI Ethernet job; for Myrinet, use gm instead of p4  
JobType == "mpi.ch_p4"  
# Execute a script file managing files and submitting the SAPT job  
Exec == "/u/bukowski/tests/ArHF_Tc/sub_s_sep"  
# Optional arguments to the "Exec" script  
Args == ""  
# Where scheduler output will end up  
Output == "/u/bukowski/LOGS/$(MAUI_JOB_ID).out"  
Error == "/u/bukowski/LOGS/$(MAUI_JOB_ID).err"  
Log == "/u/bukowski/LOGS/$(MAUI_JOB_ID).log"  
# Input  
Input == "/dev/null"

The launch script sub_s_sep should be similar to

JOB=ArHF                # job name  
SCRDIR=/scratch/local/bukowski      # local scratch directory  
                        # SAPT driver script  
# Set the number of nodes as assigned by the scheduler  
# copy all SAPT input files onto the scratch dir  
# on every node assigned by scheduler. With common file-system -  
# use just a simple copy  
cd $CURDIR  
# Create the list of nodes  
rm -f calcnodes  
for node in ‘echo $MAUI_JOB_NODES | sed -e ’s/:/ /g’‘ ; do  
        echo $node >> calcnodes  
cp calcnodes PROC  
# Copy input files and lists of nodes to each local scratch directory  
for i in ‘awk ’{print $1}’ calcnodes‘  
echo Copying input to $i  
 ssh $i mkdir $SCRDIR  
 scp *inp $i:$SCRDIR  
 scp $JOB’P.data’ $i:$SCRDIR  
 scp  PROC $i:$SCRDIR  
 scp calcnodes $i:$SCRDIR  
# Submit the job in scratch directory  
cd $SCRDIR  
for i in ‘awk ’{print $1}’ calcnodes‘  
echo Cleaning up on $i  
 ssh $i rm $SCRDIR/*  

The script sub_s_sep launches the psapt2k2 calculation using the local scratch file-systems /scratch/local, where all the necessary data files (including input) are copied using the scp command. The number and the list of nodes are retrieved from the environment variables MAUI_TASK_COUNT and MAUI_JOB_NODES supplied from within the scheduler environment. Due to some file management issues which could not be resolved, it is necessary that only one process is launched on each compute node (i.e., the parameter TaskPerNode in MAUI_script must be set to 1). The actual psapt2k2 run is started using the driver pSAPT.mpich.maui.sep, adapted to make use of the local scratch environment. Replacing this driver with pSAPT.mpich.maui, changing the scratch directory to the global one, say,
/scratch/huinalu/bukowski, and replacing the loop structure with scp by a set of simple cp commands would result in a psapt2k2 calculation utilizing the common scratch directory instead of the local ones. Since psapt2k2 jobs are crucially dependent on the efficiency of I/O operations, the local scratch option should be preferred as it allows to avoid competition for bandwidth between the processes.

14.5 Input files

The input files for gamess and the SAPT suite of codes are constructed in essentially the same way as in the case of the sequential version of the program, sapt2012, as described in detail in Sec. 10 . The only difference is that the options SAPT0, SAPT2, SAPT, and others of this kind introduced in later editions of SAPT codes are currently not supported in the namelist INPUTCOR. The theory levels corresponding to these options may of course be recovered by requesting the appropriate SAPT corrections individually. Note also that no SAPT corrections beyond the standard level, like, e.g., the components of ESAPT(30), Eq. (9 ), are available in psapt2k2, and frozen core is not implemented in the parallel SAPT program.

14.6 Memory and disk requirements

Memory requirements of psapt2k2 can be estimated based on the total number of basis functions, the numbers of occupied and virtual orbitals, and the number of processors involved. The transformation code ptran requires slightly more than on3∕P 8-byte words on each of the P processors, where n is the dimension of the basis set and o is the number of occupied orbitals of the bigger monomer. For the pcc and psapt parts, memory requirements can be roughly estimated as 3o2v2 words on each processor, where v is the number of virtual orbitals for the larger monomer.

More precise calculation of the memory required for a psapt2k2 run can be done using the program memcalc (built automatically during installation). In order to use this program, prepare the regular nameP.data file with the first (title) line containing, in this order: the total number of DC+BS basis functions, the total number of functions for monomer A (will be different than the previous one in a MC+BS-type run), the number of occupied orbitals for monomer A, and then the same information for monomer B, followed by the number of processors. Then run the program by typing

memcalc < nameP.data > memcalc.out

The file memcalc.out will contain detailed information about memory requirements of different parts of the code. The variables MEMTRAN from namelist TRN and MEMSAPT from namelist INPUTCOR should then be adjusted accordingly. The memory needed to run the pcc program is calculated there explicitly, so that no separate namelist variable is needed.

An estimate of the total disk space requirement can be obtained as presented in Sec. 10.7 for a sequential program. It should be noted here that most of the large scratch files are distributed between the processes. The amount of disk space needed on a single processor is thus only a fraction 1∕P of the total disk requirement. This feature becomes important on Beowulf clusters, where typically only a relatively small scratch area is available on each node.

14.7 Electrostatic, dispersion, and induction (EDI) energies from monomer properties

It is well known that the electrostatic component of the interaction energy can be calculated from monomer charge densities, which are purely monomer properties. Similarly, the (second-order) induction energy may be expressed in terms of monomer charge densities and the static susceptibility functions. The same holds also for the dispersion energy, which is given by an integral of the product of two monomer dynamic susceptibility functions over (imaginary) frequency, the so-called Casimir-Polder integral. Thus, with the exception of exchange energies, all the components of the interaction are in fact determined exclusively by monomer properties. These properties can be calculated only once for each monomer at a high level of (intramonomer) correlation, and then used, after rotating and translating to the new monomer positions, to obtain the three interaction components for any dimer configuration. The point is that the expensive parts of the calculation – those of the correlated charge density and susceptibility functions – have to be performed only once for each monomer, and not at each and every dimer geometry. In particular, costly four-index transformations usually needed in highly correlated methods (such as the four-virtual transformations) are done only for a single-point monomer calculation, and avoided during the actual interaction energy calculations.

The electron density of a monomer can be expressed as a combination of n(n + 1)2 products of atomic orbitals (AOs), where n denotes the number of these orbitals in the basis set used. Thus, a calculation of the electrostatic energy from two precomputed monomer densities requires of the order of n4 2-electron integrals (for similar size monomers, so that n is approximately the same for both of them). The monomer susceptibility functions (both static and dynamic) are given as combinations of ov * (ov + 1)2 terms (products of four molecular orbitals, two depending on coordinates of electron 1 and the other two on coordinates of electron 2). The time cost of obtaining induction and dispersion components is again dominated by the need to compute n4 2-electron integrals. This unfavorable scaling can be greatly reduced if both the electron densities and the susceptibility functions are fitted in terms of a suitably chosen auxiliary basis. In most cases, the size of this basis, m, can be assumed to be proportional to the size of the original AO basis (but several times larger). The electron density can now be expressed as a combination of m terms, while the susceptibility functions consist of m(m + 1)2 terms (products of the auxiliary basis functions). The cost of the calculation of electrostatics, induction, and dispersion, dominated by 2-electron integrals between the auxiliary functions, is now proportional to just m2 instead of n4.

The electron densities and susceptibility functions are fitted by minimizing functionals of the type

 ∫ - - Δ = [ρ(r1)- ρ(r1)](1∕r12)[ρ(r2)- ρ(r2)]dr1dr2

∫ ρ(r1)dr1 = N or zero.

The quantity ρ denotes here either one of the MBPT contributions to electron density, which is normalized to the number of electrons N, or a transition density (a product of an occupied and a virtual orbital), which is normalized to zero. The quantity ρ is an approximation to ρ in the form of a linear expansion in terms of auxiliary basis functions. The choice of 1∕r12 as the “weight” in the functional Δ has been suggested by Dunlap [60]. Minimization of the functional Δ reduces then to solving a system of linear equations for the expansion coefficients. Only one such equation system needs to be solved for each of the MBPT components of density, whereas fitting of the susceptibility function for each imaginary frequency requires solving ov such systems, one for each pair of the occupied and virtual orbitals.

Presented here is the beta version of the suite of codes implementing the ideas discussed above. The codes allow the generation of monomer charge densities (currently at the SCF, MBPT2 and MBPT3 levels, with and without orbital relaxation) and susceptibility functions (currently at the CHF level), generation of an auxiliary basis, and the fitting of the monomer properties in terms of this basis. The charge densities, susceptibility functions, and SCF vectors are then used as input to a program which calculates the electrostatic, induction, and dispersion energies for an arbitrary set of dimer geometries. Two versions of this code exist, one utilizing the exact representation of monomer properties (i.e., in terms of the original AO basis), called caldisp_gms, and the other – working with the density-fitted monomer properties, called caldisp_fit.

14.7.1 The pEDI scripts

Most of the the tasks mentioned above are accomplished with the help of the script pEDI.X, where X stands for the platform at hand. This script works very much like the script pSAPT.X used in “regular” psapt2k2 calculations. Thus, the SCF runs for the two monomers are performed first, followed by the integral transformation, the CCSD/MBPT, and psapt runs. A new element is that the psapt module now calculates the MBPT2 and MBPT3 densities and stores them on disk, and the pcksdisp program is called to produce the susceptibility functions, currently at the CHF (or RPA) level. In addition to generating the monomer properties, the regular psapt2k2 calculation is also performed for one dimer geometry, obtained by shifting the position of monomer B by 10 bohr along the positive z axis with respect to the position specified in the nameB file. The CHF dispersion energy is also computed for this geometry by the program pcksdisp (the same one that generates the susceptibility functions). The results for this one geometry, reported in the output from the pEDI.X run, may then be compared to their counterparts obtained using the transformationless codes, to assess the correctness and accuracy of the latter. The transformationless code itself, caldisp_gms, is invoked at the end of pEDI.X to compute the electrostatics, induction, and dispersion energies for specified dimer geometries.

The detailed instructions for running pEDI.X are as follows:

  1. Assuming that the name of your job is name, prepare the files nameA and nameB containing the specifications of the geometries and basis sets of monomers A and B in gamess(us) format. Remember to put blank lines after basis set of each atom, including the blank at the end of the file. Out of these files, the pEDI.X script will construct gamess input files needed to calculate the SCF vectors and generate integrals. A spherical basis set will be assumed (i.e., ISPHER=1), and the name of the job will be used as the comment line. Geometries of the monomers specified in nameA and nameB should correspond to whatever you consider to be the initial configuration of the given monomer, i.e., the COM (or other characteristic point around which rotations will be performed) should coincide with the origin of the coordinate system, and all Euler angles describing the orientation of the monomer will be assumed zero at this geometry.
  2. Prepare the file nameP.data, as for the regular (pure) MCBS SAPT calculation (note that in namelist TRN the variables BASIS and TAGS have to be specified for such a calculation). In the namelist INPUTCOR, the variables CHFDISP=T, CHFIND=T, E12=T, E12R=T, E13PL=T, E13PLR=T have to be specified in order for the MBPT2 and MBPT3 densities to be dumped on disk. Failure to specify any of these corrections will result in the corresponding density missing in the dump files denaMO.data and denbMO.data (it does not hurt the subsequent calculations except that the quantities requiring the missing densities will be reported as zero). Also, some other corrections may be specified, so that the results from the Summary Table may be later compared to what comes out of the transformationless algorithms for the geometry considered in the pEDI.X run. These may include E1TOT=T, E2IND=T, E2INDR=T, and E2DSP=T. The namelist INPUT controlling the behavior of the propagator code pcksdisp is generated automatically by the pEDI.X script (and appended to file5.dat), so the user does not have to worry about it. For the sake of completeness we state here that this namelist currently contains the following:
       IQUADTYP=1, NQUAD=16,  

    The meaning of these parameters is described in Appendix E . In particular, the user may want to alter the length NQUAD of the quadrature employed in the calculation of the Casimir-Polder integral.

  3. Prepare the file input.edi, specifying the dimer geometries for which the interaction energies are to be calculated by caldisp_gms out of the precomputed monomer properties. input.edi consists of lines (one for each dimer geometry) containing, in this order, the COM-COM separation (in Å), the β and γ Euler angles (in degrees) for monomer A (α is assumed as zero), followed by the α,β, and γ Euler angles of monomer B. It is assumed that the COM of monomer A coincides with the origin of the coordinate system and that monomer B is shifted in the positive direction of the z axis. For example,
    5.29177249 0.          0.            0.          0.           0.  
    2.38521611 86.80692336 90.00000000 180.00000000 93.37890335 360.00000000  
    2.63658901 87.13779711 90.00000000 180.00000000 81.94039222 360.00000000  
    2.89422970 87.42491802 90.00000000 180.00000000 75.80434134 359.99999915

    would be a valid input.edi file containing 4 geometries. The first of these geometries is exactly the same as the one for which the interaction energies are computed during the pEDI.X run.

  4. Run the EDI job the same way a “regular” psapt2k2 job would be run. For example, on O2K/O3K platform without a queuing system, to run in /scratch/mydir on 8 processors one would type
    pEDI.o2k name noscfcp 8 /scratch/mydir > name.out 2>&1 &

    Note that the noscfcp keyword has been used since the supermolecular SCF interaction energy is of no interest here.

A result of running the pEDI.X script will be the file name.out containing standard output from the whole run. In this file, the SAPT corrections requested in the nameP.data file will be reported (in the “Summary Table” section), together with the dispersion and induction energies calculated by the propagator code pcksdisp for one specific dimer configuration, described earlier. The last part of name.out will be the output from the caldisp_gms code, i.e., the electrostatic, induction, and dispersion energies for all geometries specified in input.edi. Specifically, the following corrections will be calculated: Eelst(10),Eelst(12),Eelst,resp(12),Eelst(13),Eelst,resp(13),Eind(20),Eind,resp(20),Edisp(20), and Edisp(2)(RPA). The last of the corrections mentioned above is calculated from the dynamic susceptibility functions at the CHF level (equivalent to RPA) and currently does not have its counterpart among the corrections calculated in a regular psapt2k2 run. It is more accurate in terms of theory level than Edisp(20).

The calculated monomer properties will be packed for further use (e.g., for a standalone invocation of caldisp_gms) into a file name.prop.tar.gz, which, after running

   gzip -d name.prop.tar.gz  
   tar -xvf name.prop.tar

will decompress info the following files:

  1. vecta.data and vectb.data – unformatted sequential files with orbital energies and SCF vectors
  2. unformatted sequential files denaMO.data and denbMO.data, containing MBPT2 and MBPT3 densities (relaxed and non-relaxed)
  3. propa.data and propb.data – dynamic susceptibility functions, currently at the CHF level (unformatted sequential files)
  4. prop0a.data, prop0b.data – static susceptibility functions, currently at the CHF level (unformatted sequential files)
  5. geometry and basis set info of the monomers will be recorded in formatted files infoa.data and infob.data in the format readable by the subsequent transformationless codes.

After collecting these files and input.edi in one scratch directory, one can run the caldisp_gms program independently of the pEDI.X script by typing (or submitting through the queuing system) a command similar to

mpirun -np 8 caldisp_gms > dimers.out 2>&1

modifying, of course, the number of processing nodes, paths, and output file name appropriately. As usual, the executable caldisp_gms can be found in psapt2K2/bin directory.

14.7.2 Calculating electrostatic, induction, and dispersion energies from fitted monomer electron densities and susceptibility functions

A promising alternative route of EDI calculations utilizes approximate representations of monomer properties in terms of the auxiliary basis sets. The accuracy of this method strongly depends on the quality of these sets. The auxiliary basis sets can be obtained using the utility program make_aux, realizing the method described in Appendix F , and the property fitting can be accomplished with the help of the program tdenfit (the relevant lines in the compall script must be commented out if these two programs are to be built during installation). Later, in the work on SAPT(DFT), it was found [1415] that the optimized auxiliary basis sets like those from Ref. 61 perform better. Thus, we recommend to use these sets in EDI calculations.

The driver scripts pEDI.X can be easily adapted to perform the property fitting by uncommenting the Generate auxiliary basis and Fit propagators and densities using auxiliary basis sections. The invocation of the tar command should also be changed so that the fitted representations of monomer properties are included in the name.prop.tar file. The pEDI.X script modified in this way will ask for two additional input files, input.aux.A and input.aux.B, specifying parameters needed for the construction of auxiliary bases. These involve the number of pruning cycles and the ϵ parameter for each of these cycles for each atom. For example, in the case of molecule A consisting of 2 atoms and with 3 pruning cycles required for each of them, the file input.aux.A could look like


After the run finishes, the auxiliary bases generated by make_aux will be placed in files auxa.data and auxb.data, similar to infoa.data and infob.data, whereas the fitted properties will be placed in (unformatted sequential) files auxdena.data, auxdenb.data, auxprop0a.data,
auxprop0b.data, auxpropa.data, and auxpropb.data. The automatic generation of auxiliary basis sets results in rather large sets if high accuracy of the fits is required.

To perform the interaction energy calculation using density-fitted monomer properties, collect all these files, along with input.edi, in a scratch directory (e.g., by uncompressing the name.prop.tar.gz file resulting from the modified pEDI.X script) and then run the caldisp_fit executable by typing a command similar to

mpirun -np 8 caldisp_fit > dimers.out 2>&1

possibly modifying the number of processing nodes, paths, and output file name. The output file, dimers.out in this case, will contain, for each dimer geometry, the following corrections: Eelst(10),Eelst(12),Eelst,resp(12),Eelst(13),Eelst,resp(13),Eind,resp(20), and Edisp(2)(RPA).

Important note: The number of processors to run caldisp_gms and caldisp_fit is independent of the number of processors on which the pEDI.X script was run. It is, however, important, that all the files used (with an exception of input.edi) are obtained in a single pEDI.X run. For example, using vecta.data and vectb.data obtained on a different number of processors (or a different machine) than that used to get denaMO.data, denbMO.data or propa.data, propb.data may result in nonsensical results if orbital degeneracies are present for one (or both) monomers as it is the case, e.g., for atoms and linear molecules. In such cases, gamess can perform arbitrary rotations within the degenerate subspaces. These rotations are dependent on the number of processors and even on the platform where the calculation is run, so it is imperative that all the files are obtained consistently. It is therefore recommended that the monomer properties are always moved around in the form of the compressed files name.prop.tar.gz rather than separately.