[gpfsug-discuss] Capacity pool filling

Thu Jun 7 15:53:16 BST 2018

I think the restore is is bringing back a lot of material with atime >  
90, so it is passing-trough gpfs23data and going directly to  
gpfs23capacity.

I also think you may not have stopped the crontab script as you  
believe you did.

Jaime

Quoting "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>:

> Hi All,
>
> First off, I?m on day 8 of dealing with two different   
> mini-catastrophes at work and am therefore very sleep deprived and   
> possibly missing something obvious ? with that disclaimer out of the  
>  way?
>
> We have a filesystem with 3 pools:  1) system (metadata only), 2)   
> gpfs23data (the default pool if I run mmlspolicy), and 3)   
> gpfs23capacity (where files with an atime - yes atime - of more than  
>  90 days get migrated to by a script that runs out of cron each   
> weekend.
>
> However ? this morning the free space in the gpfs23capacity pool is   
> dropping ? I?m down to 0.5 TB free in a 582 TB pool ? and I cannot   
> figure out why.  The migration script is NOT running ? in fact, it?s  
>  currently disabled.  So I can only think of two possible   
> explanations for this:
>
> 1.  There are one or more files already in the gpfs23capacity pool   
> that someone has started updating.  Is there a way to check for that  
>  ? i.e. a way to run something like ?find /gpfs23 -mtime -7 -ls? but  
>  restricted to only files in the gpfs23capacity pool.  Marc Kaplan -  
>  can mmfind do that??  ;-)
>
> 2.  We are doing a large volume of restores right now because one of  
>  the mini-catastrophes I?m dealing with is one NSD (gpfs23data pool)  
>  down due to a issue with the storage array.  We?re working with the  
>  vendor to try to resolve that but are not optimistic so we have   
> started doing restores in case they come back and tell us it?s not   
> recoverable.  We did run ?mmfileid? to identify the files that have   
> one or more blocks on the down NSD, but there are so many that what   
> we?re doing is actually restoring all the files to an alternate path  
>  (easier for out tape system), then replacing the corrupted files,   
> then deleting any restores we don?t need.  But shouldn?t all of that  
>  be going to the gpfs23data pool?  I.e. even if we?re restoring  
> files  that are in the gpfs23capacity pool shouldn?t the fact that  
> we?re  restoring to an alternate path (i.e. not overwriting files  
> with the  tape restores) and the default pool is the gpfs23data pool  
> mean that  nothing is being restored to the gpfs23capacity pool???
>
> Is there a third explanation I?m not thinking of?
>
> Thanks...
>
> ?
> Kevin Buterbaugh - Senior System Administrator
> Vanderbilt University - Advanced Computing Center for Research and Education
> Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> -   
> (615)875-9633
>
>
>
>

          ************************************
           TELL US ABOUT YOUR SUCCESS STORIES
          http://www.scinethpc.ca/testimonials
          ************************************
---
Jaime Pinto - Storage Analyst
SciNet HPC Consortium - Compute/Calcul Canada
www.scinet.utoronto.ca - www.computecanada.ca
University of Toronto
661 University Ave. (MaRS), Suite 1140
Toronto, ON, M5G1M1
P: 416-978-2755
C: 416-505-1477

----------------------------------------------------------------
This message was sent using IMP at SciNet Consortium, University of Toronto.