To create a batch job, first,create a script which starts up the job. It will be a plain shell script file containing commands to setup and startup your job. The following environment variables will be available to the script (and the job):
Although it is possible to use command line options and script wrappers to submit jobs, it is usually more convenient to use just a single script to include all options for the job.
A example script will look something like:
#!/bin/bash #$ -S /bin/bash #$ -N Si ## Job Name #$ -pe mpi_4cpu 8 ## nodes=2,ppn=4 thus, total MPI task is 8 #$ -q q4 ## queue name (At the present, q_1, q4 and q4long available) #$ -R yes ## Resource Reservation #$ -wd [user's working directory] #$ -l h_rt=50:00:00 ## (hh:mm:ss) (wall clock time) #$ -l normal=TRUE ## For the good priority. Jobs are killed without this. #$ -M [user's email address] #$ -m e ## Send email when jobs are finished. #$ -e err.$JOB_ID #$ -o out.$JOB_ID # Run the parallel MPI executable "namu.exe" # Here namu.exe is the executable and params.00286 is the argument of it. /packages/openmpi-intel/bin/mpirun -machinefile $TMPDIR/machines -np $NSLOTS ./namu.exe < params.00286
The following basic options may be used to submit the job.
-A [account name] -- Specify the account under which to run the job -N [name] -- The name of the job -l h_rt=hr:min:sec -- Maximum walltime for this job -r [y,n] -- Should this job be re-runnable (default y) -pe [type] [num] -- Request [num] amount of [type] nodes. ex. total MPI task 4 : #$ -pe mpi_4cpu 4 total MPI task 8 : #$ -pe mpi_4cpu 8 total MPI task 16: #$ -pe mpi_4cpu 16 -wd [directory path] -- Place the output files (.e,.o) in the [directory path] working directory. -S [shell path] -- Specify the shell to use when running the job script -l normal=TRUE -- add this for the good priority
The present SGE QUEUE configuration on bluesky is shown below.
[SGE QUEUE CONFIGURATION] ===================================================================================================== Queue Time_Limit Runing_Nodes CPUs/job Priority Available pe type ===================================================================================================== q_1 168hrs n001-n040 1 normal mpi_1cpu (only for serial jobs) ----------------------------------------------------------------------------------------------------- q4 168hrs n001-n040 4-32 normal mpi_4cpu,mpi_fu ----------------------------------------------------------------------------------------------------- q4long 504hrs n001-n040 4-32 normal mpi_4cpu,mpi_fu ===================================================================================================== where (pe type) mpi_4cpu has 4 allocation rule per node (nodes=1 and ppn=4), mpi_1cpu has 1 allocation rule per node (nodes=1 and ppn=1), mpi_fu has $fill_up allocation rule.
After creating the script in (say) myscript.sge, use the following command to submit the job to SGE:
qsub myscript.sge
Node or queue status can be obtained by using the "qhost" command.
An example listing is shown below. (Refer to "qhost -help")
Useful command options,
"qhost -j" for displaying jobs hosted by host.
"qhost -q" for displaying queues hosted by host.
bluesky:/packages/sge> qhost -j HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - bluesky - - - - - - - n001 lx24-amd64 8 4.01 11.8G 1.2G 2.0G 102.0M n002 lx24-amd64 8 4.10 11.8G 682.3M 2.0G 77.1M job-ID prior name user state submit/start at queue master ja-task-ID ---------------------------------------------------------------------------------------------- 119 0.55500 Si userd r 11/19/2010 15:53:34 q4@n002 MASTER q4@n002 SLAVE q4@n002 SLAVE q4@n002 SLAVE q4@n002 SLAVE n003 lx24-amd64 8 4.00 11.8G 1.3G 2.0G 0.0 n004 lx24-amd64 8 3.97 11.8G 1.5G 2.0G 120.0M n005 lx24-amd64 8 4.03 11.8G 448.4M 2.0G 115.6M n006 lx24-amd64 8 4.01 11.8G 1.2G 2.0G 0.0 ...
bluesky:/packages/sge> qhost -q HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - bluesky - - - - - - - n001 lx24-amd64 8 0.00 11.8G 480.0M 2.0G 101.0M q4 BP 0/0/4 q4long BP 0/0/4 q_1 BP 0/0/4 n002 lx24-amd64 8 0.15 11.8G 761.1M 2.0G 77.2M q4 BP 0/0/4 q4long BP 0/0/4 q_1 BP 0/0/4 n003 lx24-amd64 8 3.97 11.8G 667.2M 2.0G 0.0 q4 BP 0/4/4 q4long BP 0/0/4 S q_1 BP 0/0/4 S n004 lx24-amd64 8 4.00 11.8G 1.2G 2.0G 120.2M q4 BP 0/0/4 S q4long BP 0/4/4 q_1 BP 0/0/4 S n005 lx24-amd64 8 4.01 11.8G 825.2M 2.0G 113.4M q4 BP 0/4/4 q4long BP 0/0/4 S q_1 BP 0/0/4 S ...
Here, 'BP' is qtype and 0/0/4 represents reserved core/used core/total core numbers.
You can use "qstat" to check the status of your jobs. (refer to "qstat -help")
A call to "qstat" without any arguments give you something like:
job-ID prior name user state submit/start at queue slots ja-task-ID --------------------------------------------------------------------------------- 304 0.60500 Si userid r 11/19/2010 17:42:36 q4@n002 4 307 0.60500 Sleeper4 userid qw 11/19/2010 17:42:37 4 310 0.60500 Sleeper4 userid qw 11/19/2010 17:42:29 4 313 0.60500 Sleeper4 userid qw 11/19/2010 17:42:29 4
Useful qstat options,
"qstat -j [job_id ]" show scheduler job information.
"qstat -f" summary information on all queues to be displayed, along with the queued job list.
"qstat -g c" display cluster queue summary
bluesky:/packages/sge> qstat -g c CLUSTER QUEUE CQLOAD USED RES AVAIL TOTAL aoACDS cdsuE -------------------------------------------------------------------------------- q4 0.39 112 0 36 160 12 0 q4long 0.39 12 0 36 160 112 0 q_1 0.39 0 0 36 160 124 0
Another way to check the status of your jobs: "showq"
Nodes from n041 to n050 are reserved for
the special use. Users are not strongly allowed to login on to
the nodes, especially above ten reserved nodes.
In preparation, now
Deleting a job is simple. First find the JOB_ID with qstat. Then do:
More useful information will be found at the web site Using Sun Grid Engine.
Last modified: Tue Oct 05 2010 at 12:46 PM