[gpfsug-discuss] Capacity pool filling
Buterbaugh, Kevin L
Kevin.Buterbaugh at Vanderbilt.Edu
Thu Jun 7 16:56:34 BST 2018
Hi All,
So in trying to prove Jaime wrong I proved him half right … the cron job is stopped:
#13 22 * * 5 /root/bin/gpfs_migration.sh
However, I took a look in one of the restore directories under /gpfs23/ RESTORE using mmlsattr and I see files in all 3 pools! So that explains why the capacity pool is filling, but mmlspolicy says:
Policy for file system '/dev/gpfs23':
Installed by root at gpfsmgr on Wed Jan 25 10:17:01 2017.
First line of policy 'gpfs23.policy' is:
RULE 'DEFAULT' SET POOL 'gpfs23data'
So … I don’t think GPFS is doing this but the next thing I am going to do is follow up with our tape software vendor … I bet they preserve the pool attribute on files and - like Jaime said - old stuff is therefore hitting the gpfs23capacity pool.
Thanks Jaime and everyone else who has responded so far…
Kevin
> On Jun 7, 2018, at 9:53 AM, Jaime Pinto <pinto at scinet.utoronto.ca> wrote:
>
> I think the restore is is bringing back a lot of material with atime > 90, so it is passing-trough gpfs23data and going directly to gpfs23capacity.
>
> I also think you may not have stopped the crontab script as you believe you did.
>
> Jaime
>
> Quoting "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>:
>
>> Hi All,
>>
>> First off, I?m on day 8 of dealing with two different mini-catastrophes at work and am therefore very sleep deprived and possibly missing something obvious ? with that disclaimer out of the way?
>>
>> We have a filesystem with 3 pools: 1) system (metadata only), 2) gpfs23data (the default pool if I run mmlspolicy), and 3) gpfs23capacity (where files with an atime - yes atime - of more than 90 days get migrated to by a script that runs out of cron each weekend.
>>
>> However ? this morning the free space in the gpfs23capacity pool is dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot figure out why. The migration script is NOT running ? in fact, it?s currently disabled. So I can only think of two possible explanations for this:
>>
>> 1. There are one or more files already in the gpfs23capacity pool that someone has started updating. Is there a way to check for that ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but restricted to only files in the gpfs23capacity pool. Marc Kaplan - can mmfind do that?? ;-)
>>
>> 2. We are doing a large volume of restores right now because one of the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool) down due to a issue with the storage array. We?re working with the vendor to try to resolve that but are not optimistic so we have started doing restores in case they come back and tell us it?s not recoverable. We did run ?mmfileid? to identify the files that have one or more blocks on the down NSD, but there are so many that what we?re doing is actually restoring all the files to an alternate path (easier for out tape system), then replacing the corrupted files, then deleting any restores we don?t need. But shouldn?t all of that be going to the gpfs23data pool? I.e. even if we?re restoring files that are in the gpfs23capacity pool shouldn?t the fact that we?re restoring to an alternate path (i.e. not overwriting files with the tape restores) and the default pool is the gpfs23data pool mean that nothing is being restored to the gpfs23capacity pool???
>>
>> Is there a third explanation I?m not thinking of?
>>
>> Thanks...
>>
>> ?
>> Kevin Buterbaugh - Senior System Administrator
>> Vanderbilt University - Advanced Computing Center for Research and Education
>> Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633
>>
>>
>>
>>
>
>
>
>
>
>
> ************************************
> TELL US ABOUT YOUR SUCCESS STORIES
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.scinethpc.ca%2Ftestimonials&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C9154807425ab4316f58f08d5cc866774%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636639799990107084&sdata=VUOqjEJ%2FWt8VI%2BWolWbpa1snbLx85XFJvc0sZPuI86Q%3D&reserved=0
> ************************************
> ---
> Jaime Pinto - Storage Analyst
> SciNet HPC Consortium - Compute/Calcul Canada
> https://na01.safelinks.protection.outlook.com/?url=www.scinet.utoronto.ca&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C9154807425ab4316f58f08d5cc866774%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636639799990107084&sdata=3PxI2hAdhUOJZp5d%2BjxOu1N0BoQr8X5K8xZG%2BcONjEU%3D&reserved=0 - https://na01.safelinks.protection.outlook.com/?url=www.computecanada.ca&data=02%7C01%7CKevin.Buterbaugh%40Vanderbilt.Edu%7C9154807425ab4316f58f08d5cc866774%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636639799990107084&sdata=JxtEYIN5%2FYiDf3GKa5ZBP3JiC27%2F%2FGiDaRbX5PnWEGU%3D&reserved=0
> University of Toronto
> 661 University Ave. (MaRS), Suite 1140
> Toronto, ON, M5G1M1
> P: 416-978-2755
> C: 416-505-1477
>
> ----------------------------------------------------------------
> This message was sent using IMP at SciNet Consortium, University of Toronto.
>
More information about the gpfsug-discuss
mailing list