[gpfsug-discuss] Rainy days and Mondays and GPFS lying to me always get me down...
Buterbaugh, Kevin L
Kevin.Buterbaugh at Vanderbilt.Edu
Mon Oct 23 14:42:51 BST 2017
Hi All,
And I’m not really down, but it is a rainy Monday morning here and GPFS did give me a scare in the last hour, so I thought that was a funny subject line.
So I have a >1 PB filesystem with 3 pools: 1) the system pool, which contains metadata only, 2) the data pool, which is where all I/O goes to by default, and 3) the capacity pool, which is where old crap gets migrated to.
I logged on this morning to see an alert that my data pool was 100% full. I ran an mmdf from the cluster manager and, sure enough:
(pool total) 509.3T 0 ( 0%) 0 ( 0%)
I immediately tried copying a file to there and it worked, so I figured GPFS must be failing writes over to the capacity pool, but an mmlsattr on the file I copied showed it being in the data pool. Hmmm.
I also noticed that “df -h” said that the filesystem had 399 TB free, while mmdf said it only had 238 TB free. Hmmm.
So after some fruitless poking around I decided that whatever was going to happen, I should kill the mmrestripefs I had running on the capacity pool … let me emphasize that … I had a restripe running on the capacity pool only (via the “-P” option to mmrestripefs) but it was the data pool that said it was 100% full.
I’m sure many of you have already figured out where this is going … after killing the restripe I ran mmdf again and:
(pool total) 509.3T 159T ( 31%) 1.483T ( 0%)
I have never seen anything like this before … any ideas, anyone? PMR time?
Thanks!
Kevin
More information about the gpfsug-discuss
mailing list