Archival data handling

Z MetaCentrum
Přejít na: navigace, hledání

(Česká verze)

Archival data is data that is valuable, but users access them only rarely. Such kind of data is NOT suitable to be kept on standard disk arrays since they consume both valuable space of the arrays and electricity. For such data, CESNET provides an infrastructure of hierarchical storages suitable for archiving purposes (details and important properties of the devices are available at http://du.cesnet.cz), which is directly integrated into the MetaCentrum infrastructure.

This how-to describes how to transfer your archival data to the hierarchical storages:

Terminology

Source storage -- a disk array on which you have your archival data.

  • for the sake of simplicity, let the archival data reside in a folder named "my-archive"

Target storage -- an archival hierarchical system.

  • archival storages are mounted under the /storage directory (the respective subfolders contain "-archive" or "-hsm" in their names).

List of Available Storages

Zálohovací třídy jsou popsány v / Back-up policy is described at: Politika_zálohování (Back-up policy). Výtah/summary:

  • třída 2 - záloha (pouze) formou časových řezů / class 2 - backup (only) in a form of time slices
  • třída 3 - data se záložní kopií / class 3 - data with a backup copy
NFS4 server adresář - directory velikost - capacity zálohovací třída - back-up policy alternativní jména serverů v Perunovi - alternative name / poznámka - note
storage-brno1-cerit.metacentrum.cz /storage/brno1-cerit/ 1.8 PB 2 nfs-ntc.ics.muni.cz
storage-brno2.metacentrum.cz /storage/brno2/ 110 TB 3 (nienna1|nienna2|nienna-home).ics.muni.cz
storage-brno3-cerit.metacentrum.cz /storage/brno3-cerit/ 932 TB 2 nfs-kat.cerit-sc.cz
storage-brno4-cerit-hsm.metacentrum.cz /storage/brno4-cerit-hsm/ zrušeno - decommissioned data archived in /storage/brno1-cerit/
storage-brno5-archive.metacentrum.cz /storage/brno5-archive/ 5 387 TiB 3 nfs.du3.cesnet.cz
storage-brno6.metacentrum.cz /storage/brno6/ 262 TB 2
storage-brno7-cerit.metacentrum.cz /storage/brno7-cerit/ ruší se - decommissioned 2 data archived in /storage/brno1-cerit/
storage-brno8.metacentrum.cz /storage/brno8/ 88 TB 3 in past /storage/ostrava1/
storage-brno9-ceitec.metacentrum.cz /storage/brno9-ceitec/ 262 TB 3 storage-ceitec1.ncbr.muni.cz - pro NCBR CEITEC
storage-brno10-ceitec-hsm.metacentrum.cz /storage/brno10-ceitec-hsm/ 3 dedicated to NCBR CEITEC
storage-brno11-elixir.metacentrum.cz /storage/brno11-elixir/ 313 TB 2 dedicated to ELIXIR-CZ
storage-budejovice1.metacentrum.cz /storage/budejovice1/ 44 TB 3 (storage-cb1|storage-cb2).metacentrum.cz
storage-jihlava1-cerit.metacentrum.cz /storage/jihlava1-cerit/ zrušeno - decommissioned data archived to /storage/brno4-cerit-hsm/fineus, storage-brno4-cerit-hsm.metacentrum.cz, symlink /storage/jihlava1-cerit/
storage-jihlava2-archive.metacentrum.cz /storage/jihlava2-archive 2 050 TiB 3
storage-liberec3-tul.metacentrum.cz /storage/liberec3-tul/ 30 TiB
storage-plzen1.metacentrum.cz /storage/plzen1/ 352 TB 2 (storage-eiger1|storage-eiger2|storage-eiger3).zcu.cz
storage-plzen2-archive.metacentrum.cz /storage/plzen2-archive/ zrušeno - decommissioned nfs.du1.cesnet.cz
storage-plzen3-kky.metacentrum.cz /storage/plzen3-kky/ 73 TiB 3
storage-praha1.metacentrum.cz /storage/praha1/ 100 TB 3 storage-praha1(a|b).metacentrum.cz
storage-praha4-fzu.metacentrum.cz /storage/praha4-fzu/ 15 TB
storage-praha5-elixir.metacentrum.cz /storage/praha5-elixir/ 157 TB 3
Note 1: To determine the quota you are provided on the particular storage, please, visit the MetaCentrum portal.
Note 2: On the HSM storages (with the suffix -archive or -hsm) the user quota is not applied (just a technical limitation of 5TB, involving an overloading of the HSM with a one-time data copy, is applied).

General Guidelines, Data Formats

In general: the smaller number of files in the archive, the better (it speeds operations up and generates lower load on the storage subsystems; on the other hand, packing the files makes searching less comfortable).

General guidelines:
  • if most of your files are large (hundreds of MBs, GBs, ...), don't bother with packing them and make a one-to-one copy to the archive,
  • if your files are smaller and you don't plan to search individual files, pack them into tar or zip files,
  • if packing the files is too uncomfortable for you, make a one-to-one copy to the archive; in this case, avoid using Pilsen storage (plzen2-archive)
Archival destination:
  • for smaller files, avoid using Pilsen storage (plzen2-archive) -- manipulating large quantities of small files is significantly slow there. Thus, the plzen2-archive has a limit to 50000 files per user.
  • for large files and file packages (tar, zip), choose any archive you like as your destination.

Transfering the files to/from the archive

The basic rule: DON'T use front-end servers for anything else than moving several small files! Submit a regular job and/or take an interactive job instead to handle with the archival data.
Note: The data transfer could be performed as a common job performing tar/rsync commands as well (see tips below). In that case, you are advised to delete primary data after you made sure the archival copy is OK.

In both the cases, we recommend to submit the job to the cluster your source storage is close to, e.g. (in the case your primary data is in Brno)

qsub -I -l select=1:ncpus=1:mem=2gb:scratch_local=2gb:brno=True -l walltime=48:00:00

Warning.gif WARNING: The master HOME directory of each HSM storage (i.e. /storage/plzen2-archive/home/$USER/) is dedicated just for initialization scripts, and thus has a limited quota of just 50 MB. To archive your data, use the VO_metacentrum-tape_tape subdirectory (e.g. /storage/plzen2-archive/home/$USER/VO_metacentrum-tape_tape), where this limitation doesn't apply.

Packed archives

  • To create a packed archive, use:
tar czvf /storage/DESTINATION-archive/home/USER/VO_metacentrum-tape_tape/my-archive.tgz my-archive
  • to list the content of the archive, use
tar tzf /storage/DESTINATION-archive/home/USER/VO_metacentrum-tape_tape/my-archive.tgz
  • if the archive creation is successful, you may delete the folder on your primary storage.
  • To recover the WHOLE data from the packed archive:
    • change the current working directory to the place you want the data save to and use:
tar xzvf /storage/DESTINATION-archive/home/USER/VO_metacentrum-tape_tape/my-archive.tgz
  • To recover a PART of the data from the packed archive:
    • change the current working directory to the place you want the data save to and determine the part from the archive you want to recover (i.e., list the content of the archive)
    tar tzf /storage/DESTINATION-archive/home/USER/VO_metacentrum-tape_tape/my-archive.tgz
    • further use:
      tar xzvf /storage/DESTINATION-archive/home/USER/VO_metacentrum-tape_tape/my-archive.tgz "PATH1/file1" "PATH2/dir2"
    OR (in the case you want to use wildcard patterns)
    tar xzvf /storage/DESTINATION-archive/home/USER/VO_metacentrum-tape_tape/my-archive.tgz --wildcards "PATH1/files*" "PATH2/dirs*"

One-to-one copies

  • To move the data into the archive as a one-to-one copy, use:
rsync -avHS --no-g my-archive /storage/DESTINATION-archive/home/USER/VO_metacentrum-tape_tape/my-archive
  • if the command fails, just run it again (it would continue from the point it stopped).
  • similarly, if you want to update the archive, simply run the command again.
  • To recover the WHOLE data from the packed archive, use:
    rsync -avHS --no-g /storage/DESTINATION-archive/home/USER/VO_metacentrum-tape_tape/my_archive /storage/DESTINATION/home/USER/my_data_recovery/my_archive
  • To recover a PART of the data from the packed archive:
    • determine the part from the archive you want to recover (i.e., by listing the files in the archival storage using common ls command)
    ls /storage/DESTINATION-archive/home/USER/VO_metacentrum-tape_tape/my-archive
    • and use:
      rsync -avHS --no-g /storage/DESTINATION-archive/home/USER/VO_metacentrum-tape_tape/my_archive/DIR1 /storage/DESTINATION/home/USER/my_data_recovery/DIR1
    OR (in the case of lower number of files to recover)
    cp -r /storage/DESTINATION-archive/home/USER/VO_metacentrum-tape_tape/my-archive/PATH1/dirs* /storage/DESTINATION/home/USER/my_data_recovery

Tips

1. Non-interactive data transfer

For larger data transfers one can use non-interactive way, just by modifying qsub parameters slightly:

qsub -m abe -l nodes=1:brno

You are prompted for command -- use a set of tar/rsync commands above, press enter and finish submition with ctrl+d. PBS job is scheduled, and you will be informed via email about start/completion/error of the transfer. Standard/error outputs of the job are stored in STDIN.* files.

2. Migration of data between any two storages

The commands above can be also used for a migration of data between any two storages. Just change the directories to point the storages. E.g., if you want to move your data from old&slow /storage/brno2 to new&empty /storage/ostrava1. Simply use:

rsync -avHS --no-g /storage/brno2/home/USER/my-data /storage/ostrava1/home/USER/my-data