Personal tools
You are here: Home / HowTo / Compute Cluster / Resources

Resources

Filed under: ,
Which resource are defined in our cluster and what do they mean

GridEngine allows to define resources which then can be used to manage cluster jobs. We use them for different purposes: license management, management of multithreaded programs and to run jobs only on nodes with specified characteristics.

To get a view of available/defined resources you can run qhost -F.

Example:

# qhost -F -h bignode01
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
   gc:cplex_lic=10.000000
bignode01               lx24-amd64     16  0.42  252.0G    2.1G  144.7G  152.0K
   gc:cplex_lic=10.000000
   hl:arch=lx24-amd64
   hl:num_proc=16.000000
   hl:mem_total=252.048G
   hl:swap_total=144.728G
   hl:virtual_total=396.775G
   hl:load_avg=0.420000
   hl:load_short=1.290000
   hl:load_medium=0.420000
   hl:load_long=0.320000
   hl:mem_free=249.954G
   hl:swap_free=144.728G
   hl:virtual_free=394.682G
   hl:mem_used=2.094G
   hl:swap_used=152.001K
   hl:virtual_used=2.094G
   hl:cpu=11.600000
   hl:np_load_avg=0.026250
   hl:np_load_short=0.080625
   hl:np_load_medium=0.026250
   hl:np_load_long=0.020000
   hf:scratch=1.000000

scratch

The "scratch" resource is a boolean non-consumable which shows if a /scratch disk is attached to the node. These /scratch disk are for jobs that need more local disk space then the normal /tmp directory. At the moment we have /scratch on all bignodes (each a few TB in size). Please use a job-specific subdirectory in /scratch and delete it at the end of your job. To request it, use

# qsub -l scratch

longrun

This is a boolean resource determining if you want to run your job for a very long time. It's only available in the longrun queue. You're forced to request this resource if you want to run something in this particular queue.
# qsub -l longrun

h_vmem

This is used to manage the available memory on nodes. It takes into account the memory requested by all jobs on a node not the really used memory. Each job is limited to the memory that was requested (enforce via an ulimit). If you don't specify your usage the default of 2GB is used.

# qsub -l h_vmem=XG

NOTICE: This is per slot used, so if you run parallel multiple-slot jobs ("-pe smp <Number of Slots>") you have to divide the overall memory needed by the number of slots requested.

huge

This is a boolean resource determining if your want to run your job to run on the hugenode. Before you can use this machine, you need to get blessed for a limited amount of time to use this unique resource for a dedicated scientific task. Please contact BIMSB_itsupport for details.

# qsub -l huge

Limits defined for our Queues

Queue Characteritics standard standard on bignodes
longrun interactive high high on bignodes
gmc
Member-nodes 97 nodes
(N001...N097)
5 bignodes
(B01...B05)
all: node001 ... -97, bignode01 ... -05 all nodes and bignodes 97 Nxxx 5 Bxx 97 Nxxx
Slots per Node 8
15  1  1 2  4  1
max. Runtime
96 hours
(4 days)
96 hours
(4 days)
6000 hrs
(~8 month)
48 hours
(2 days)
504 hours
(3 weeks)
504 hours
(3 weeks)
24 hrs
(1 day)
max. Memory 62GB 250GB 7GB
bignodes: 50GB
8GB
bignodes: 64GB
62GB  256G  1GB
max. Stack 256MB 512MB  unlimited (!)  256MB 1GB  1G  500M
Queue type  batch-only batch-only
batch-only interactive bat+int bat+int bat+int
run priority 0 0 0 0 -20 -20 0
run parallel all all make, smp (not: orte)  none  all all   make
re-run (if failed) yes  yes  no   no  yes  yes  no
permitted user-groups  all  all all but guest, technical, and schmidt-ott  all Project / have to pay for Project / have to pay for  gmc-course
selection-preference  20 25 50  20
bignodes: 50 
10 15 

 

Common Properties: 

np_load_avg=1.75
Suspend Jobs: never
Processors=undefined (none for longrun)
shell = /bin/csh, sehll_start_method = unix (posix for gmc)

more

Document Actions