Personal tools
You are here: Home / HowTo / Compute Cluster / Upgrade 2010

Compute-Cluster Upgrade

Filed under:
Planed and achieved tasks to Upgrade the BIMSB HPC-Cluster in 2010 to more CPU-Cores and Memory, and in 2011 to more recent Operating System and Libraries

Latest status (June 10th. 2011): 

  • Step 5 - Preparations: Auto-Network-Install new OS (Ubuntu). (see Step 5 )

  • Step 6 - Auto-Network-Install OS and Configuration (few Nodes)

  • Step 9 - Hardware-Upgrade of the Compute Nodes (CPU and RAM) finished

Outline

This document will be changed according to the progress of the cluster-upgrade. Latest changes/status can be read in the orange lines at the top. Below you find the detailed planning, together with some technical details and the timeline.

Please have a look a the Task #4, that unfortunately must be done by each user individually.

At the end of the hardware upgrade (about end 2010), our cluster will have twice the compute-power. The software upgrade will give a state-of-the-art operating system, libraries and grid-engine, and better fail-over and maintenance mechanisms than before.

To keep things somewhat easy, we will split the cluster into an "old" (blue) and a "new" (green) part as sketched in the figure below; details can be found below in the timeline.

Timeline

Affected Nodes 
  • Login1, Login2
  • Node001, Node002, ..., Node097
  • Bignode01, Bignode02, ..., Bignode05

Upgrade-Tasks #1 (each Node)

  1. Change CPU: 2x Dualcode --> 2x Quadcore.
  2. Change RAM: Upgrade Memory.
    • on (only) four Compute-Nodes for Illumina from 4GB to 8GB
    • on Headnodes (Login1, and Login2) to 8GB
    • on 60 Compute-Nodes from 4GB to 64GB (maximum)
  3. Change OS: Install Ubuntu 10.04-LTS Server "Lucid Lynx".
  4. Change SGE: Upgrade Sun-Grid-Engine v6.2-update-0 
    • to SGE 6.2-update-5 (or update-6, if available). [PLANNED]
    • to UGE-8 (Univa-Grid-Engine) [UPDATE, as SGE was inofficially retried by Oracle after buying Sun]

Upgrade-Tasks #2 (Clusterwide)

  1. SGE-Master on Headnode(s), matching the SGE-Client on the Compute-Nodes.
    • Compute-Nodes should contact SGE-Shadow-Master during fail-over. 
  2. Applications and Libraries in /usr/local, matching the OS.
  3. Keep User-Accounts valid.
    • Make NIS-yppasswd and Ubuntu-PAM use blowfish in /etc/shadow.
    • Compute-Nodes should contact NIS-Shadow-Master during fail-over.

Upgrade-Tasks #3 (Maintenance)

  1. Automated (Network-)Installation of OS and Updates.
    • Setup local mirror of Ubuntu-Repository (apt-proxy cache).
    • Prepare Scripts (PXE-linux, Debian-installer with preseed) for OS-Installation via Network.
    • NOT: Image-Installation (Might fail for changed hardware, would cause Downtime to install updates, would have to setup and test two mechanisms).
  2. Update all Nodes on-the-fly without service-interruption
    (instead of now: only the Headnodes).
  3. Keep Reinstall-Mechanism for broken+repaired Nodes in sync with "live"-Situation on the Nodes. (Puppet framework)

Upgrade-Tasks #4 (each User)

Unfortunately, these tasks cannot be done by the Administrators in a sense-full way.

  1. If you have compiled your own tools to be used on the cluster its somewhat likely that they won't execute on the changes OS (Ubuntu instead of Suse)
    => You might have to recompile such tools or applications.
  2. If you have prepared sophisticated scripts to ease your personal workflow of using the cluster-power, some hardcoded paths might change.
    => You might have to adjust such hardcoded paths and filenames.

Whenever we get to know of needed changes, we will forward such information to you.

Timeline

The changes should be finished by the end of November 2011, if no big, unforeseen problems arrise.

  1. √ Plan and Prepare the Upgrade.
  2. √ Inform the users about the comming changes and this page.
    (see Info 1 - only visible if you have an account here on BBC)
  3. √ Login1: setup as secondary SGE-master,
    to permit downtime in step 4 without interruption of grid-compute service.
    • √ setup SGE for fail-over scenario, where Login1 becomes SGE-Master while Login2 is down.
    • √ inform users about comming tests. (see Info 2)
    • √ TEST: Shutdown SGE-service on Login2 for some seconds / few minutes. (Results: see Step 3)
      • √ Check for running jobs to survive (keept known by the shadow-master)
      • √ Check for new jobs to be submittable via shadow-master
      • X Check for nodes to talk to shadow-master
        • FAILED (known SGE-Issue #2700: execd ignores changed act_master; fixed in SGE v6.2-update1)
      • X Check for waiting jobs to be started by the shadow-master
        • FAILED (see above: execd doesn't report current load to changed master)
      • X Check for new jobs (qsub, qlogin) to be started by the shadow-master
        • FAILED (see above)
      • √ Check for statistics to keep collecting data
      • √ Check for hand-back to primary master
  4. √ Login2: short downtime [~2 hours]
    to enable Login2 to take the whole cluster-access load of the old cluster during step 5-12.
    • √ inform users about 1st downtime on August 3rd. (see Info 3)
    • √ sudo wall /tmp/nologin.txt
    • √ create /etc/nologin
    • √ wait for all users to logoff; shutdown machine
    • (X) Clone the internal HDD to have a drop-in replacement in case of emergency
      • Takes ~3.5hr ("dd sda to sdb" for 250GB at 20MB/s, faster in "outer" areas of the HDDs)
      • FAILED: The copy on the target-HDD got garbaged by swap-usage of the cloning-OS.
    • (X) upgrade CPU
      • FAILED: Got only unprogramed PROMs
    • √ upgrade RAM
    • √ Inform users to re-use Login2
    • √ prepare 2nd downtime on August 19th, to also upgrade CPUs. (see Info 4)
    • √ FIXED: redo steps above to do CPU/PROM/BIOS-upgrade (cf. Step 9
    • √ FIXED: hdd-clone, June 6th 2011
      • Change single-hdd operation to fail-safe software-raid mirror. Took 30 Minutes downtime without need to offline-copy the source-partition.
  5. Login1: long downtime [~1 week],
    to upgrade and setup as 1st member of the new cluster.
    • √ Preparations: Auto-Network-Install new OS (Ubuntu)
    • inform users that Login1 will leave the old cluster (see Info 5)
    • set nologin; wait for all users to logoff; shutdown machine
    • √ upgrade CPU and RAM
    • change OS (Network-Install from Manager1; setup Manager1 to do so)
    • setup as Ubuntu-repository-mirror with packages stored on fileserver
    • install latest SGE and tune performance (db instead of flat-files)
    • install tools, libs and applications in /usr/local_new (on FSxx)
    • become secondary NIS-master
  6. Few compute-nodes: upgrade operating system, and join new cluster
    • √ select ~10 nodes + 1 bignode
    • √ diable nodes (qmod -d *@node###) and wait jobs to finish
    • √ set nologin; wait for all users to logoff; leave old-cluster-SGE, shutdown
    • √ install new OS automatically via network (scripted via manager1), install from mirror at login1
    • link to new SGE on Login1
    • inform users to tryout Cluster-2.0 via Login1 and report problems and missing features.
  7. Test Cluster-2.0
    • access via Login1
    • run computations that worked well on the old cluster
    • check for errors/warnings, missing tools/libs
    • compare results with those on the old setup
    • all: please report errors and problems
    • admins query power-users directly for outcome
    • adjust sys-install script and reinstall nodes of stage 6 
  8. Login1+Login2: Migrate Admin-Tools,
    [parallel to step 7]
    • identify all services on Login2 and copy them to the new Cluster, i.e. Login1
    • Take special care for the CPLEX Licence-Manager on Login2 (bound to 192.168.73.232)
    • migrate (i.e. setup to be script-installed on Login1)
      • nagios (apache)
      • ganglia (gmeta)
      • loghost/logwatch
      • CPlex License Manager (IBM ILM)
      • more
  9. √ All computer-nodes: upgrade CPUs on idle,
    [parallel to steps 5 tru 10]
    whenever idle-state of a compute-node and one admin-person detected ...
    • √ disable node (qmod -d *@node###), wait Jobs to finish
    • √ shutdown and upgrade to QuadCore-CPUs [~1 hour] (Details: see Step 9)
    • √ for most nodes: also upgrade RAM
    • √ enable Node in SGE with new number of cores reflected by SGE-Slots
  10. Most compute-nodes: upgrade operating system, and join new cluster,
    • inform users to leave the old cluster and work only on cluster-2.0
    • Select few nodes to stay in the old cluster (as backup for problems that did not show up during the test in step 7)
    • Upgrade all other compute-nodes automatically as outlined above in step 6
  11. Wait for unexpected problems in cluster-2.0 during full work conditions.
  12. Upgrade last compute-nodes (see step 10)
  13. Login2: Shutdown old Cluster
    • automatic install with all head-node functionalities (as in step 6+9)
    • become fail-over headnode with Login1.
  14. Done (hopefully)

nothing more

EOT

Document Actions