Next: Accessing data / Notes
Up: Data migration to dCache
Previous: Preparation / Naming
Contents
Index
The Migration Procedure
The data has to be migrated to dCache from selected work group servers (see Infobox below)
by the tool called migrate. It allows to recursively migrate a given source directory
or single larger files into dCache. In common, the script call results in a creation of a
tarball in the dCache tape instance1.4 and thus generates an archive copy of data. It can
be executed in different manner as follows:
/opt/gamma/migrate --
beamtime=<AppID> --
pack or --
copy [--
stage] [--
parent] [--
split] /path/to/dir
or
/opt/gamma/migrate --
beamtime=<AppID> --
copy --
format=tar [--
stage] /path/to/file.tar1.5
or (most likely for future Nexus format)
/opt/gamma/migrate --
beamtime=<AppID> --
copy [--
stage] /path/to/file
where
- <AppID> denotes the Beamtime Application ID (NOT the proposal ID) for the dataset
taken,
--
pack option is used to create a tarball from the source directory,
--
copy option can be used to directly copy a source file
(! ONLY USE THIS FOR SOME SINGLE LARGE FILES [100MB
filesize
300GB] such as tarballs
or Nexus-Files! Smaller or larger file sizes will cause a tremendous increase of file access
times [tape rewinds, tape changes etc.]). Typically one uses this option together with the
format option given below.
--
format option can be used as option when copying a single large file. It
tells the script that the large file is a tarball and will cause a tarball extraction call
when staging the data to dCache disc instance. Currently, you can provide uncompressed as
well as compressed tarballs to the script. In case one wants to provide a compressed archive,
it is recommended to use bzip2 compression scheme for compression (tar option
'j').1.6
--
stage option is optional and means that the data should be staged to dCache disc instance in addition to bringing it to tape.
--
parent option is optional and means that the data should be migrated by preserving the name of the upper directory (i.e. if migration partially failed).
--
split option is optional and means that data migration should
start from on the first subdirectory, i.e. the script does not create a single
tarball for directories smaller than 350 GB. It treats the given directory
in same manner as for directories larger than 350 GB.
Mandatory for data migration are the provision of the beamtime application ID
AppID
,
the pack (or copy) option and the provision of a data data source. In common, the
data source is the name of the 1
level of a directory tree and thus
the pack option is mandatory as well.
The most common call for data migration is:
/opt/gamma/migrate --
beamtime=<AppID> --
pack --
split /path/to/dir
Current limitations / currently to take into account:
- the migration is executed in the context of a special user who needs to have read
access to the data. Thus, Unix rights need to be set such that 'others' have read (and
execute for directories) rights to the data on the OnlineFS for the migration1.7.
- when migrating data using the
--
pack option, either the total size of the source directory or
any directory of the second directory level has to be smaller than 350 GiB.
I.e: the script first checks the total size of source directory. If it is smaller than 350 GiB,
a single tarball will be created in `dCache tape'. If the total source directory size
exceeds the given limit, the script checks for files, directories and sizes on the next lower
level. If a subdirectory stays below the size limit, it will be migrated into a dedicated
corresponding tarball otherwise migration will be refused by the script. Any existing file on
the 1
subdirectory level will be combined into a single tarball.
- When using the copy option to migrate a single large file, the file size limit is set to
800 GiB at maximum. Reasonable/suitable file sizes should stay in between 100 MiB and 300 GiB.
- During data migration, some status information is given in the terminal window on the progress
- The data migration script DOES NOT delete the source data. The beamline staff has to
verify the successful script execution and has to delete the migrated data from the Online
Fileserver by themself1.8
Next: Accessing data / Notes
Up: Data migration to dCache
Previous: Preparation / Naming
Contents
Index
Andre Rothkirch
2013-07-17