[gpfsug-discuss] Capacity pool filling

Thu Jun 7 15:45:52 BST 2018

Hi again all,

I received a direct response and am not sure whether that means the sender did not want to be identified, but they asked good questions that I wanted to answer on list…

No, we do not use snapshots on this filesystem.

No, we’re not using HSM … our tape backup system is a traditional backup system not named TSM.  We’ve created a top level directory in the filesystem called “RESTORE” and are restoring everything under that … then doing our moves / deletes of what we’ve restored … so I *think* that means all of that should be written to the gpfs23data pool?!?

On the “plus” side, I may figure this out myself soon when someone / something starts getting I/O errors!  :-O

In the meantime, other ideas are much appreciated!

Kevin

Do you have a job that’s creating snapshots?  That’s an easy one to overlook.

Not sure if you are using an HSM. Any new file that gets generated should follow the default rule in ILM unless if meets a placement condition. It would only be if you’re using an HSM that files would be placed in a non-placement location pool but that is purely because the the file location has already been updated to the capacity pool.

On Thu, Jun 7, 2018 at 8:17 AM -0600, "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu<mailto:Kevin.Buterbaugh at Vanderbilt.Edu>> wrote:

Hi All,

First off, I’m on day 8 of dealing with two different mini-catastrophes at work and am therefore very sleep deprived and possibly missing something obvious … with that disclaimer out of the way…

We have a filesystem with 3 pools:  1) system (metadata only), 2) gpfs23data (the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with an atime - yes atime - of more than 90 days get migrated to by a script that runs out of cron each weekend.

However … this morning the free space in the gpfs23capacity pool is dropping … I’m down to 0.5 TB free in a 582 TB pool … and I cannot figure out why.  The migration script is NOT running … in fact, it’s currently disabled.  So I can only think of two possible explanations for this:

1.  There are one or more files already in the gpfs23capacity pool that someone has started updating.  Is there a way to check for that … i.e. a way to run something like “find /gpfs23 -mtime -7 -ls” but restricted to only files in the gpfs23capacity pool.  Marc Kaplan - can mmfind do that??  ;-)

2.  We are doing a large volume of restores right now because one of the mini-catastrophes I’m dealing with is one NSD (gpfs23data pool) down due to a issue with the storage array.  We’re working with the vendor to try to resolve that but are not optimistic so we have started doing restores in case they come back and tell us it’s not recoverable.  We did run “mmfileid” to identify the files that have one or more blocks on the down NSD, but there are so many that what we’re doing is actually restoring all the files to an alternate path (easier for out tape system), then replacing the corrupted files, then deleting any restores we don’t need.  But shouldn’t all of that be going to the gpfs23data pool?  I.e. even if we’re restoring files that are in the gpfs23capacity pool shouldn’t the fact that we’re restoring to an alternate path (i.e. not overwriting files with the tape restores) and the default pool is the gpfs23data pool mean that nothing is being restored to the gpfs23capacity pool???

Is there a third explanation I’m not thinking of?

Thanks...

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180607/653f8ca5/attachment.htm>