Go back to the center for academic computing home page.
User's guide to Sun Grid Engine (SGE) on bluesky.khu.ac.kr

User's guide to Sun Grid Engine (SGE) on bluesky.khu.ac.kr






SGE Commands




Creating a job

To create a batch job, first,create a script which starts up the job. It will be a plain shell script file containing commands to setup and startup your job. The following environment variables will be available to the script (and the job):



Although it is possible to use command line options and script wrappers to submit jobs, it is usually more convenient to use just a single script to include all options for the job.

A example script will look something like:

#!/bin/bash               
#$ -S /bin/bash
#$ -N Si                   ## Job Name
#$ -pe mpi_4cpu 8          ## nodes=2,ppn=4 thus, total MPI task is 8
#$ -q q4                   ## queue name (At the present, q_1, q4 and q4long available)
#$ -R yes                  ## Resource Reservation     
#$ -wd [user's working directory] 
#$ -l h_rt=50:00:00        ## (hh:mm:ss) (wall clock time)
#$ -l normal=TRUE          ## For the good priority. Jobs are killed without this.
#$ -M [user's email address]
#$ -m e                    ## Send email when jobs are finished.
#$ -e err.$JOB_ID
#$ -o out.$JOB_ID
# Run the parallel MPI executable "namu.exe"
# Here namu.exe is the executable and params.00286 is the argument of it.
/packages/openmpi-intel/bin/mpirun -machinefile $TMPDIR/machines -np $NSLOTS ./namu.exe < params.00286

The following basic options may be used to submit the job.

 
-A [account name] -- Specify the account under which to run the job 
-N [name] -- The name of the job 
-l h_rt=hr:min:sec -- Maximum walltime for this job 
-r [y,n] -- Should this job be re-runnable (default y) 
-pe [type] [num] -- Request [num] amount of [type] nodes. 
    ex. total MPI task 4 : #$ -pe mpi_4cpu 4
        total MPI task 8 : #$ -pe mpi_4cpu 8
        total MPI task 16: #$ -pe mpi_4cpu 16
-wd [directory path] -- Place the output files (.e,.o) in the [directory path] working directory. 
-S [shell path] -- Specify the shell to use when running the job script 
-l normal=TRUE -- add this for the good priority

The present SGE QUEUE configuration on bluesky is shown below.

                                      [SGE QUEUE CONFIGURATION] 
  =====================================================================================================
     Queue    Time_Limit     Runing_Nodes   CPUs/job    Priority   Available pe type            
  =====================================================================================================
      q_1      168hrs         n001-n040        1         normal     mpi_1cpu (only for serial jobs)
  -----------------------------------------------------------------------------------------------------
      q4       168hrs         n001-n040       4-32       normal     mpi_4cpu,mpi_fu
  -----------------------------------------------------------------------------------------------------
      q4long   504hrs         n001-n040       4-32       normal     mpi_4cpu,mpi_fu       
  =====================================================================================================
  where (pe type) mpi_4cpu has 4 allocation rule per node (nodes=1 and ppn=4),
                  mpi_1cpu has 1 allocation rule per node (nodes=1 and ppn=1),
                  mpi_fu has $fill_up allocation rule.

After creating the script in (say) myscript.sge, use the following command to submit the job to SGE:

qsub myscript.sge

Node or queue status can be obtained by using the "qhost" command.
An example listing is shown below. (Refer to "qhost -help")
Useful command options,
"qhost -j" for displaying jobs hosted by host.
"qhost -q" for displaying queues hosted by host.

bluesky:/packages/sge> qhost -j
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
bluesky                 -               -     -       -       -       -       -
n001                    lx24-amd64      8  4.01   11.8G    1.2G    2.0G  102.0M
n002                    lx24-amd64      8  4.10   11.8G  682.3M    2.0G   77.1M
   job-ID  prior   name       user         state submit/start at     queue      master ja-task-ID 
   ----------------------------------------------------------------------------------------------
       119 0.55500 Si       userd          r     11/19/2010 15:53:34 q4@n002    MASTER        
                                                                     q4@n002    SLAVE         
                                                                     q4@n002    SLAVE         
                                                                     q4@n002    SLAVE         
                                                                     q4@n002    SLAVE     
n003                    lx24-amd64      8  4.00   11.8G    1.3G    2.0G     0.0
n004                    lx24-amd64      8  3.97   11.8G    1.5G    2.0G  120.0M
n005                    lx24-amd64      8  4.03   11.8G  448.4M    2.0G  115.6M
n006                    lx24-amd64      8  4.01   11.8G    1.2G    2.0G     0.0
...
bluesky:/packages/sge> qhost -q 
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
bluesky                 -               -     -       -       -       -       -
n001                    lx24-amd64      8  0.00   11.8G  480.0M    2.0G  101.0M
   q4                   BP    0/0/4         
   q4long               BP    0/0/4         
   q_1                  BP    0/0/4         
n002                    lx24-amd64      8  0.15   11.8G  761.1M    2.0G   77.2M
   q4                   BP    0/0/4         
   q4long               BP    0/0/4         
   q_1                  BP    0/0/4         
n003                    lx24-amd64      8  3.97   11.8G  667.2M    2.0G     0.0
   q4                   BP    0/4/4         
   q4long               BP    0/0/4         S
   q_1                  BP    0/0/4         S
n004                    lx24-amd64      8  4.00   11.8G    1.2G    2.0G  120.2M
   q4                   BP    0/0/4         S
   q4long               BP    0/4/4         
   q_1                  BP    0/0/4         S
n005                    lx24-amd64      8  4.01   11.8G  825.2M    2.0G  113.4M
   q4                   BP    0/4/4         
   q4long               BP    0/0/4         S
   q_1                  BP    0/0/4         S
...

Here, 'BP' is qtype and 0/0/4 represents reserved core/used core/total core numbers.



Checking the status of a job/queue

You can use "qstat" to check the status of your jobs. (refer to "qstat -help")

A call to "qstat" without any arguments give you something like:

 
job-ID  prior   name       user   state submit/start at   queue  slots ja-task-ID
---------------------------------------------------------------------------------
 304 0.60500 Si         userid      r     11/19/2010 17:42:36 q4@n002          4 
 307 0.60500 Sleeper4   userid      qw    11/19/2010 17:42:37                  4 
 310 0.60500 Sleeper4   userid      qw    11/19/2010 17:42:29                  4 
 313 0.60500 Sleeper4   userid      qw    11/19/2010 17:42:29                  4 

Useful qstat options,
"qstat -j [job_id ]" show scheduler job information.
"qstat -f" summary information on all queues to be displayed, along with the queued job list.
"qstat -g c" display cluster queue summary

bluesky:/packages/sge> qstat -g c
CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDS  cdsuE  
--------------------------------------------------------------------------------
q4                                0.39    112      0     36    160     12      0 
q4long                            0.39     12      0     36    160    112      0 
q_1                               0.39      0      0     36    160    124      0 

Another way to check the status of your jobs: "showq"

Nodes from n041 to n050 are reserved for the special use. Users are not strongly allowed to login on to the nodes, especially above ten reserved nodes.



Allocated CPU resources

In preparation, now



Deleting a job

Deleting a job is simple. First find the JOB_ID with qstat. Then do:


qdel JOB_ID




More useful information will be found at the web site Using Sun Grid Engine.

Last modified: Tue Oct 05 2010 at 12:46 PM