[gpfsug-discuss] Capacity pool filling

Thu Jun 7 16:56:34 BST 2018

Hi All,

So in trying to prove Jaime wrong I proved him half right … the cron job is stopped:

#13 22 * * 5 /root/bin/gpfs_migration.sh

However, I took a look in one of the restore directories under /gpfs23/ RESTORE using mmlsattr and I see files in all 3 pools!  So that explains why the capacity pool is filling, but mmlspolicy says:

Policy for file system '/dev/gpfs23':
   Installed by root at gpfsmgr on Wed Jan 25 10:17:01 2017.
   First line of policy 'gpfs23.policy' is:
RULE 'DEFAULT' SET POOL 'gpfs23data'

So … I don’t think GPFS is doing this but the next thing I am going to do is follow up with our tape software vendor … I bet they preserve the pool attribute on files and - like Jaime said - old stuff is therefore hitting the gpfs23capacity pool.

Thanks Jaime and everyone else who has responded so far…

Kevin

> On Jun 7, 2018, at 9:53 AM, Jaime Pinto <pinto at scinet.utoronto.ca> wrote:
> 
> I think the restore is is bringing back a lot of material with atime > 90, so it is passing-trough gpfs23data and going directly to gpfs23capacity.
> 
> I also think you may not have stopped the crontab script as you believe you did.
> 
> Jaime
> 
> Quoting "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>:
> 
>> Hi All,
>> 
>> First off, I?m on day 8 of dealing with two different  mini-catastrophes at work and am therefore very sleep deprived and  possibly missing something obvious ? with that disclaimer out of the  way?
>> 
>> We have a filesystem with 3 pools:  1) system (metadata only), 2)  gpfs23data (the default pool if I run mmlspolicy), and 3)  gpfs23capacity (where files with an atime - yes atime - of more than  90 days get migrated to by a script that runs out of cron each  weekend.
>> 
>> However ? this morning the free space in the gpfs23capacity pool is  dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot  figure out why.  The migration script is NOT running ? in fact, it?s  currently disabled.  So I can only think of two possible  explanations for this:
>> 
>> 1.  There are one or more files already in the gpfs23capacity pool  that someone has started updating.  Is there a way to check for that  ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but  restricted to only files in the gpfs23capacity pool.  Marc Kaplan -  can mmfind do that??  ;-)
>> 
>> 2.  We are doing a large volume of restores right now because one of  the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool)  down due to a issue with the storage array.  We?re working with the  vendor to try to resolve that but are not optimistic so we have  started doing restores in case they come back and tell us it?s not  recoverable.  We did run ?mmfileid? to identify the files that have  one or more blocks on the down NSD, but there are so many that what  we?re doing is actually restoring all the files to an alternate path  (easier for out tape system), then replacing the corrupted files,  then deleting any restores we don?t need.  But shouldn?t all of that  be going to the gpfs23data pool?  I.e. even if we?re restoring files  that are in the gpfs23capacity pool shouldn?t the fact that we?re  restoring to an alternate path (i.e. not overwriting files with the  tape restores) and the default pool is the gpfs23data pool mean that  nothing is being restored to the gpfs23capacity pool???
>> 
>> Is there a third explanation I?m not thinking of?
>> 
>> Thanks...
>> 
>> ?
>> Kevin Buterbaugh - Senior System Administrator
>> Vanderbilt University - Advanced Computing Center for Research and Education
>> Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> -  (615)875-9633
>> 
>> 
>> 
>> 
> 
> 
> 
> 
> 
> 
>         ************************************
>          TELL US ABOUT YOUR SUCCESS STORIES
>         https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.scinethpc.ca%2Ftestimonials&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C9154807425ab4316f58f08d5cc866774%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636639799990107084&sdata=VUOqjEJ%2FWt8VI%2BWolWbpa1snbLx85XFJvc0sZPuI86Q%3D&reserved=0
>         ************************************
> ---
> Jaime Pinto - Storage Analyst
> SciNet HPC Consortium - Compute/Calcul Canada
> https://na01.safelinks.protection.outlook.com/?url=www.scinet.utoronto.ca&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C9154807425ab4316f58f08d5cc866774%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636639799990107084&sdata=3PxI2hAdhUOJZp5d%2BjxOu1N0BoQr8X5K8xZG%2BcONjEU%3D&reserved=0 - https://na01.safelinks.protection.outlook.com/?url=www.computecanada.ca&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C9154807425ab4316f58f08d5cc866774%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636639799990107084&sdata=JxtEYIN5%2FYiDf3GKa5ZBP3JiC27%2F%2FGiDaRbX5PnWEGU%3D&reserved=0
> University of Toronto
> 661 University Ave. (MaRS), Suite 1140
> Toronto, ON, M5G1M1
> P: 416-978-2755
> C: 416-505-1477
> 
> ----------------------------------------------------------------
> This message was sent using IMP at SciNet Consortium, University of Toronto.
>