Personal tools
You are here: Home / HowTo / Compute Cluster / Self-Restore

Self-Restore

How to restore files from the backup that you've just deleted by error.

Self-Restore Files from Backup

The storage capacity (over 30 TeraBytes each) of the 13 fileservers in the BIMSB HPC-Cluster is way too big to have a usefull (i.e. daily) backup on tape.

(Our tape-streamers can do ~1TB/day, i.e. we could only have a somewhat "monthly" backup which would be of little use, if you just deleted a file that was created "yesterday". -- Additionally we do not have 13 tape-streamers, one for every fileserver. Thus it would take us month (!) to have the initial "full" backup of all fileservers, for then starting "incremential" backups afterwards.)

Because of that, we use a special feature of the fileservers (of the underlaying Solaris-ZFS filesystem) to create daily snapshots of the whole filesystems, which keep files available that were deleted or modified during the last 7 days. The user can access (read) these files without the need of support by the administrators.

Such snapshots are currently created every early morning (at 3:45) for the following locations:

  • User-home directories: /home*
  • Workgroup-dirs: /data/BIO2, /data/bioinformatics, /data/chen, /data/genetics, /data/huebner, /data/huebner2*, /data/landthaler, /data/loewer, /data/ohler*, /data/pombo*, and /data/proteomics
  • Database-dirs: /data/circrna/data/databases, /data/deep_seq2+9, and /data/galaxy

Notice-1: There are NO snapshots created for /data/deep_seq, and deep_seq3 tru deep_seq8, where the current sequencing runs create many Terabytes at once. The occupied space would still be blocked for one week after deletion of a run in the "live" filesystem. To easily free space for new runs, no snapshots are keept here.

Notice-2: Those 4 locations (marked with a *) reside on more recent servers, which enables more frequent backups (about 60 snapshots in total, which are hourly for the last 24 hours, daily for the last week and weekly for the last half year), but the restore is more sophisticated (see below).

HowTo restore a file from the snapshot-backup of a workgroup-dir

Lets assume you've deleted or modified /data/mygroup/work-dir/somefile.dat

  • Login to the clusters Headnode (login1 or login2):
    ssh me@login1
  • Change-directory to the root of the respective filesystem:
    cd /data/mygroup/
  • Enter the special directory ".zfs" (this dir isn't shown by "ls" or GUI-directory-browsers, thus you have to manually type-in this name):
    cd .zfs

    Notice: this special directory only exists in the filesystem "root" directories (see locations list above)

  • Proceed to the snapshot-directory and choose the day (e.g. Monday) that you expect the desired file still existed:
    cd snapshot; ls -l; cd Mon
    For /home and the /data/huebner2, /data/loewer, /data/ohler, /data/pombo, please use
    cd .snapshot; ls -la
    and then cd into the desired ".auto-<date>T<time>UTC" directory, see below.
  • Proceed to the original directory that your file was in, and see whether it still is there
    cd work-dir; ls -l some*
  • Copy back the lost file to where you want it to have:
    cp -p somefile.dat /data/mygroup/work-dir/somefile_restored.dat

In brief you could also have done all this in one line:

ssh me@login1 \
    cp -p /data/mygroup/.zfs/snapshot/Mon/work-dir/somefile.dat \
          /data/mygroup/work-dir/somefile_restored.dat

Now you can start over at the last known-good state of the file, or compare the changes between the current and the restored version of the file, or (especially when it's friday, 21 o'clock) bring the sysadmin a cold beer or a delicious cake next monday-morning ;-)

Special restore procedure for the more frequent locations (like in /home and fileservers 15 tru 18)

For those locations that have snapshots more frequent than daily for the last week, the snapshots unfortunately aren't named "Mon", "Tue", "Wed", ..., but have automatic names containing date and time like ".auto-2015-01-13T02:00:00UTC". Such snapshots aren't displayed by normal "ls" (unless you remember to use "ls -a").

So the restore command needs an adjustment:

ssh me@login1 \
    cp -p /home/.zfs/snapshot/.auto-2015-01-13T14:00:00UTC/me/some-dir/somefile.dat \
          /home/me/some-dir/somefile_restored.dat

 

Tips and Tricks

You do not have to copy-back the files from the (read-only) snapshot directory. You also could diff your current file to the snapshot version, grep inside or compute checksums:

me@login1:~$ diff /data/mygroup/.zfs/snapshot/Mon/pipeline.py /data/mygroup/pipeline.py
< def StartAnalyzze()
---
> def StartAnalyze()

me@node053:~$ sha1sum /data/mygroup/.zfs/snapshot/*/db/contaminants.fasta
4c30306054ba0361b18594568e9660b8a28f77d6  /data/mygroup/.zfs/snapshot/Mon/db/contaminants.fasta
4c30306054ba0361b18594568e9660b8a28f77d6  /data/mygroup/.zfs/snapshot/Tue/db/contaminants.fasta
c1dbbcf5ad75b6786b7dc37e3b0330ea7478f1cc  /data/mygroup/.zfs/snapshot/Wed/db/contaminants.fasta
c1dbbcf5ad75b6786b7dc37e3b0330ea7478f1cc  /data/mygroup/.zfs/snapshot/Thu/db/contaminants.fasta
siegert@login1:~$
BUT: If your actions are cpu-, io- or memory-intensive (i.e. affect large files, like in the 2nd example), please use qlogin to do this on a compute node!

 

If you want to know about all your possible previous version (file sizes and dates) of the files that were snapshotted, use the following command line:

me@login1:~$ ls -l /data/mygroup/.zfs/snapshot/*/workdir/somefile.dat

For the hourly-snapshot locations (including /home) the simple "*" needs to be replaced by ".auto-*":

me@login1:~$ ls -l /data/mygroup/.zfs/snapshot/.auto-*/workdir/otherfile.dat
me@login1:~$ ls -l /home/.zfs/snapshot/.auto-*/me/some-dir/myfile.dat
 
EOT

 

Document Actions