PBS pro conditional excution

PBS jobs with conditional execution

from LINK

It is possible to start a job on the condition that another one completes beforehand; this may be necessary for instance if the input to one job is generated by another job. Job dependency is defined in PBS using the -W flag. To illustrate with an example, suppose you need to start a job using the script second_job.sh after another job finished successfully. Assume the first job is started using script first_job.sh and the command to start the first job.

To do this, you must use an extra command-line option as follows:

qsub -W depend=<dependency_list> [Options] [script_file]

Your options for dependency lists are:

  • afterany:<job_id> — This will hold the job until after job_id finishes, either in error or successfully
  • afterok:<job_id> — This will hold the job until after job_id finishes. If job_id finishes successfully, the submitted job will be instantly released for execution. Otherwise, it is deleted.
  • afternotok:<job_id> — This will hold the job until after job_id finishes. If job_id fails, the submitted job will be instantly released for execution. Otherwise, it is deleted.

For example: first job is submitted

qsub amber_first.pbs

and it returns the job ID 61112.sched01. Then, the command to start the second job is

qsub -W depend=afterok:61112.sched01 amber_second.pbs

This job dependency can be further automated (possibly to be included in a bash script) using environment variables:

JOB_ID_1=`qsub amber_first.pbs`
JOB_ID-2=`qsub -W depend=afterok:$JOB_ID_1 amber_second.pbs`

Running parallel code with PBS scheduler

When you are running an MPI program, you often launch that program with syntax like:

mpirun -np 4 <program> [args]

to launch <program> with 4 threads on the same machine. This typically only makes sense if you have at least 4 processors (or processing cores) available. When using PBS, you typically want to run on every processor you requested. This is where PBS_NODEFILE (mentioned in the previous section) comes in handy. PBS_NODEFILE is a temporary file created by your PBS job as soon as your job starts that tells your job where it can run, with a separate line for each node you were allocated (if you requested 8 cores on a single node, that node is listed 8 times).

Therefore, you can determine the number of nodes you have available via

nprocs=`echo $PBS_NODEFILE | wc -l`

which counts the number of lines in PBS_NODEFILE.

You can then use the mpirun command via:

mpirun -np $nprocs <program> [options]

As a better option, though, most mpirun or mpiexec programs will take an arbitrary machine file that tells it where to run (instead of -np 4). Therefore, you should use a command like:

mpirun -machinefile $PBS_NODEFILE <program> [options]

or

mpirun -hostfile $PBS_NODEFILE <program> [options]

Some MPIs are also compiled with Torque/PBS support (ask your sysadmin about this). If this is the case, then mpiexec will already know to look in PBS_NODEFILE, and all you will have to type is:

mpiexec <program> [options]

and it will already run ‘correctly’.