Personal tools
You are here: Home / HowTo / Compute Cluster

Introduction

Filed under: ,
A short introduction into cluster concepts and cluster commands

The BIMSB Cluster is scheduled for decomissioning in September 2015.

All BIMSB-Cluster users MUST migrate to MaxCluster.

Please Review the Max Manuals and the Max Intro presentation (PDF, 3MB).

Concepts

The concept of our cluster is that you have a pool of computers available for computations. You can then send a job to the cluster management software (in our case Univa GridEngine).

This will start your job on the  next computer which becomes idle. There are also ways to manage parallel execution on a number of nodes. (See Parallel Execution)

To get an account please contact helpdesk@mdc-berlin.net (helpdesk@mdc-berlin.de from outside of MDC).  Note that you MUST be registered in the MDC personnel database before a Cluster account can be created.

Rules

  • Do not run any computations on the login nodes.
  • If you run programs needing more then one CPU core (multitask or multithreaded), request the parallel environment "smp" with the right number of slots.
  • Do not run computations outside the GridEngine.
  • Monitor your jobs to see if any problem arises.
  • If you have questions or observations: communicate!

Hardware

  • 100 Sun Fire X2200 M2 (smallnodes)
    • 2 AMD Opteron 2356 (4core, 2,3 GHz)
    • 64 GB RAM (12 Nodes with only 16 GB, 21 Nodes with only 8 GB)
    • 250 GB HD
  • 5 Sun Fire X4600 M2 (bignodes)

    • 8 AMD Opteron 8220 (2core, 2,8 GHz)
    • 256 GB RAM
    • 2x146 GB HD
  • 13 Sun Fire X4500 and X4540

    • 2 AMD Opteron 290 (2core, 2,8 GHz)
    • 16 GB RAM
    • 36 TB HD (3 server with a second 36TB)
  • 1 One Microsystems (hugenode )
    • 8 Intel E7- 8870 (10core, 2.4 GHz)
    • 1 TB RAM
  • 2 StorageTek 2540 + 4 StorageTek 2530 Extensions (scratch for bignodes and volume for dbserver

    • 1 * 20 TB, 1* 40 TB

Basic commands

qsub

let's you run arbitrary Linux commands or scripts. Some important options are:

  • -e <path> : path to standard error output (default is <job_name>.e<job_id

  • -o <path> : path to standard output (default is <job_name>.o<job_id>
  • -v <variable_name>=<value> : set an environment variable to the specified value, -v PATH sets the search path for executables.
  • -wd <path> : run with <path> as working directory
  • -cwd : run in current directory
  • -l resource=<value> : request a resource
  • -pe <parallel environment> <number of slots>:

Important environment variables known during the job:

  • HOSTNAME: name of the host where the job is running

  • JOB_ID : numerical id of the job

  • JOB_NAME : name given to the job

  • TMPDIR : path of the job specific tmp-directory. This is automagically created per job and deleted after the job finishes. Use this instead of the normal /tmp

qmon

qmon gives you a graphical interface for submitting jobs and to see and edit cluster settings (if you have admin rights).

qrsh

qrsh logs you into one of the cluster nodes. You can control this by using the -l and -q options like with qsub. Using qrsh you can also run X11 programs on cluster nodes. For this you need to have the line

ForwardX11 yes

 in your .ssh/config. Make sure .ssh is only readable by your user. Than you can do qrsh <program> to run the program.

qstat

can show you the status of waiting and running jobs.

qstat -j <job_id>

for a pending job shows you the reason why the job could not be run. (queues full, unmet resource requirements etc.)

qhost

gives you information about available hosts.

qhost -F

shows you the available resources.

qhost -j

also shows you jobs running on the nodes.

qacct

gives you a kind of accounting information.

Defined queues

  • standard : normal queue, maximum job run time is 96 hours.
  • high : high priority queue, restricted access
  • longrun :  queue for long running jobs. This queue is not for interactive jobs and will be suspended when other queues are full. You need to request the "longrun" resource. See Resources., limited to 1 slot per host
  • interactive : for interactive usage. This queue is meant for interactive uses. It offers only 3 slots per user (only 1 bignode slot).
  • gmc : separate queue for the "Gene Mapping Course"

Further reading

  • man pages for qsub, qlogin, qhost, qstat, qacct
  • Gridwiki

 

Document Actions