[gpfsug-discuss] Question about Policies

Sanchez, Paul Paul.Sanchez at deshaw.com
Sat Dec 28 17:07:15 GMT 2019


If you needed to preserve the "wackiness" of the original file and pathnames (and I'm assuming you need to preserve the pathnames in order to avoid collisions between migrated files from different directories which have the same basename, and to allow the files to found/recovered again later, etc) then you can use Marc's `mmfind` suggestion, coupled with the -print0 argument to produce a null-delimited file list which could be coupled with an "xargs -0" pipeline or "rsync -0" to do most of the work. 

Test everything with a "dry-run" mode which reported what it would do, but without doing it, and one which copied without deleting, to help expose bugs in the process before destroying your data. If the migration doesn't cross between independent filesets, then file migrations could be performed using "mv" without any actual data copying.  (For that matter, it could also be done in two stages by hard-linking, then unlinking.)

But I think that there are other potential problems involved, even before considering things like path escaping or fileset boundaries...

If everything is predicated on the age of a file, you will need to create the missing directory hierarchy in the target dir structure for files which need to be "migrated".  If files in a directory vary in age, you may move some files but leave others alone (until they become old enough to migrate) creating incomplete and probably unusable versions at both the source and target.  What if a user recreates the missing files as they disappear?  As they later age, do you overwrite the files on the target?  What if a directory name is later changed to a filename or vice-versa? Will you ever need to "restore" these structures? If so, will you merge these back in to the original source if both non-empty source and target dirs exist?  Should we wait for an entire dir hierarchy to age out and then archive it atomically?  (We would want a way to know where project dir boundaries are.) 

I would urge you to think about how complex this might actually get before start performing surgery within data sets.  I would be inclined to challenge the original requirements to ensure that what you are able to accomplish matches up with the real goals without creating a raft of new operational problems or loss of work product.  Depending on the original goal, it may be possible to do this (more safely) with snapshots or tarballs.

-Paul

-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Jonathan Buzzard
Sent: Saturday, December 28, 2019 10:17 AM
To: gpfsug-discuss at spectrumscale.org
Subject: Re: [gpfsug-discuss] Question about Policies

This message was sent by an external party.


On 27/12/2019 14:20, david_johnson at brown.edu wrote:
> You would want to look for examples of external scripts that work on 
> the result of running the policy engine in listing mode.  The one 
> issue that might need some attention is the way that gpfs quotes 
> unprintable characters in the pathname. So the policy engine generates 
> the list and your external script does the moving.
>

In my experience a good starting point would be to scan the list of files from the policy engine and separate the files out into "normal"; that is files using basic ASCII and no special characters and the rest also known as the "wacky pile".

Given that you are UK based it is not unreasonable to expect all path and file names to be in English. There might (and if not probably
should) be an institutional policy mandating it. Not much use if a researcher saves everything in Greek then gets knocked over by a bus and person picking up the work is Spanish for example.

Hopefully the "wacky pile" is small, however expect to find all sorts of bizarre file and path names in it. We are talking wildcards, back ticks, even newline characters to name but a few.

Depending on the amount of data in the "wacky" pile you might just want to forget about moving them, as they are orders of magnitude more difficult to deal with than files with "sane" path and file names and can rapidly soak up large chunks of time trying to deal with them in scripts.

JAB.

--
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


More information about the gpfsug-discuss mailing list