[gpfsug-discuss] Backing up GPFS with Rsync

Alec anacreo at gmail.com
Wed Mar 10 02:59:18 GMT 2021


You would definitely be able to search by inode creation date and find the
files you want... our 1.25m file filesystem takes about 47 seconds to
query...  One thing I would worry about though is inode deletion and
inter-fileset file moves.   The SQL based engine wouldn't be able to
identify those changes and so you'd not be able to replicate deletes and
such.

Alternatively....
I have a script that runs in about 4 minutes and it pulls all the data out
of the backup indexes, and compares the pre-built hourly file index on our
system and identifies files that don't exist in the backup, so I have a
daily backup validation...  I filter the file list using ksh's printf date
manipulation to filter out files that are less than 2 days old, to reduce
the noise.  A modification to this could simply compare a daily file index
with the previous day's index, and send rsync a list of files (existing or
deleted) based on just a delta of the two indexes (sort|diff), then you
could properly account for all the changes.  If you don't care about file
modifications just produce both lists based on creation time instead of
modification time.  The mmfind command or GPFS policy engine should be able
to produce a full file list/index very rapidly.

In another thread there was a conversation with ACL's...  I don't think our
backup system backs up ACL's so I just have GPFS produce a list of all ACL
applied objects on the daily, and have a script that just makes a null
delimited backup file of every single ACL on our file system... and have a
script to apply the ACL's as a "restore".  It's a pretty simple thing to
write-up and keeping 90 day history on this lets me compare the ACL
evolution on a file very easily.

Alec

MVH
Most Victorious Hunting
(Why should Scandinavians own this cool sign off)

On Tue, Mar 9, 2021 at 6:22 PM Ryan Novosielski <novosirj at rutgers.edu>
wrote:

> Yup, you want to use the policy engine:
>
>
> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.0/com.ibm.spectrum.scale.v4r2.adv.doc/bl1adv_policyrules.htm
>
> Something in here ought to help. We do something like this (but I’m
> reluctant to provide examples as I’m actually suspicious that we don’t have
> it quite right and are passing far too much stuff to rsync).
>
> --
> #BlackLivesMatter
> ____
> || \\UTGERS,     |---------------------------*O*---------------------------
> ||_// the State  |         Ryan Novosielski - novosirj at rutgers.edu
> || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
> ||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, Newark
>      `'
>
> > On Mar 9, 2021, at 9:19 PM, William Burke <bill.burke.860 at gmail.com>
> wrote:
> >
> >  I would like to know what files were modified/created/deleted (only for
> the current day) on the GPFS's file system so that I could rsync ONLY those
> files to a predetermined external location. I am running GPFS 4.2.3.9
> >
> > Is there a way to access the GPFS's metadata directly so that I do not
> have to traverse the filesystem looking for these files? If i use the rsync
> tool it will scan the file system which is 400+ million files.  Obviously
> this will be problematic to complete a scan in a day, if it would ever
> complete single-threaded. There are tools or scripts that run multithreaded
> rsync but it's still a brute force attempt. and it would be nice to know
> where the delta of files that have changed.
> >
> > I began looking at Spectrum Scale Data Management (DM) API but I am not
> sure if this is the best approach to looking at the GPFS metadata - inodes,
> modify times, creation times, etc.
> >
> >
> >
> > --
> >
> > Best Regards,
> >
> > William Burke (he/him)
> > Lead HPC Engineer
> > Advance Research Computing
> > 860.255.8832 m | LinkedIn
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20210309/3dd0f70a/attachment-0002.htm>


More information about the gpfsug-discuss mailing list