[gpfsug-discuss] Rainy days and Mondays and GPFS lying to me always get me down...

Buterbaugh, Kevin L Kevin.Buterbaugh at Vanderbilt.Edu
Mon Oct 23 14:42:51 BST 2017


Hi All,

And I’m not really down, but it is a rainy Monday morning here and GPFS did give me a scare in the last hour, so I thought that was a funny subject line.

So I have a >1 PB filesystem with 3 pools:  1) the system pool, which contains metadata only,  2) the data pool, which is where all I/O goes to by default, and 3) the capacity pool, which is where old crap gets migrated to.

I logged on this morning to see an alert that my data pool was 100% full.  I ran an mmdf from the cluster manager and, sure enough:

(pool total)           509.3T                                     0 (  0%)             0 ( 0%)

I immediately tried copying a file to there and it worked, so I figured GPFS must be failing writes over to the capacity pool, but an mmlsattr on the file I copied showed it being in the data pool.  Hmmm.

I also noticed that “df -h” said that the filesystem had 399 TB free, while mmdf said it only had 238 TB free.  Hmmm.

So after some fruitless poking around I decided that whatever was going to happen, I should kill the mmrestripefs I had running on the capacity pool … let me emphasize that … I had a restripe running on the capacity pool only (via the “-P” option to mmrestripefs) but it was the data pool that said it was 100% full.

I’m sure many of you have already figured out where this is going … after killing the restripe I ran mmdf again and:

(pool total)           509.3T                                  159T ( 31%)        1.483T ( 0%)

I have never seen anything like this before … any ideas, anyone?  PMR time?

Thanks!

Kevin


More information about the gpfsug-discuss mailing list