PBS jobs with conditional execution
from LINK
It is possible to start a job on the condition that another one completes beforehand; this may be necessary for instance if the input to one job is generated by another job. Job dependency is defined in PBS using the -W flag. To illustrate with an example, suppose you need to start a job using the script second_job.sh after another job finished successfully. Assume the first job is started using script first_job.sh and the command to start the first job.
To do this, you must use an extra command-line option as follows:
qsub -W depend=<dependency_list> [Options] [script_file]
Your options for dependency lists are:
- afterany:<job_id> — This will hold the job until after job_id finishes, either in error or successfully
- afterok:<job_id> — This will hold the job until after job_id finishes. If job_id finishes successfully, the submitted job will be instantly released for execution. Otherwise, it is deleted.
- afternotok:<job_id> — This will hold the job until after job_id finishes. If job_id fails, the submitted job will be instantly released for execution. Otherwise, it is deleted.
For example: first job is submitted
qsub amber_first.pbs
and it returns the job ID 61112.sched01. Then, the command to start the second job is
qsub -W depend=afterok:61112.sched01 amber_second.pbs
This job dependency can be further automated (possibly to be included in a bash script) using environment variables:
JOB_ID_1=`qsub amber_first.pbs` JOB_ID-2=`qsub -W depend=afterok:$JOB_ID_1 amber_second.pbs`
Running parallel code with PBS scheduler
When you are running an MPI program, you often launch that program with syntax like:
mpirun -np 4 <program> [args]
to launch <program> with 4 threads on the same machine. This typically only makes sense if you have at least 4 processors (or processing cores) available. When using PBS, you typically want to run on every processor you requested. This is where PBS_NODEFILE (mentioned in the previous section) comes in handy. PBS_NODEFILE is a temporary file created by your PBS job as soon as your job starts that tells your job where it can run, with a separate line for each node you were allocated (if you requested 8 cores on a single node, that node is listed 8 times).
Therefore, you can determine the number of nodes you have available via
nprocs=`echo $PBS_NODEFILE | wc -l`
which counts the number of lines in PBS_NODEFILE.
You can then use the mpirun command via:
mpirun -np $nprocs <program> [options]
As a better option, though, most mpirun or mpiexec programs will take an arbitrary machine file that tells it where to run (instead of -np 4). Therefore, you should use a command like:
mpirun -machinefile $PBS_NODEFILE <program> [options]
or
mpirun -hostfile $PBS_NODEFILE <program> [options]
Some MPIs are also compiled with Torque/PBS support (ask your sysadmin about this). If this is the case, then mpiexec will already know to look in PBS_NODEFILE, and all you will have to type is:
mpiexec <program> [options]
and it will already run ‘correctly’.