[gpfsug-discuss] excessive lowDiskSpace events (how is threshold triggered?)
Jonathan Buzzard
jonathan at buzzard.me.uk
Tue Feb 25 21:29:43 GMT 2014
On 25/02/14 20:17, mark.bergman at uphs.upenn.edu wrote:
>
> I'm running GPFS 3.5.0.9 under Linux, and I'm seeing what seem to be an
> excessive number of lowDiskSpace events on the "system" pool.
>
> I've got an mmcallback set up, including a log report of which pool is
> triggering the lowDiskSpace callback.
Bear in mind that once you hit a lowDiskSpace event your callback will
helpfully be called every two minutes until the condition is cleared. So
you callback needs to have locking otherwise the mmapplypolicy will go
nuts if it takes more than two minutes to clear the lowDiskSpace event.
>
> The part that is confusing me is that the "system" pool doesn't seem to be
> above the policy thresholds.
>
> For example, 'mmdf' shows that there is about 26% free in the 'system' pool:
>
> -------------------------
> disk disk size failure holds holds free free
> name group metadata data in full blocks in fragments
> --------------- ------------- -------- -------- ----- --------------------
> -------------------
> Disks in storage pool: system (Maximum disk size allowed is 33 TB)
> dx80_rg16_vol1 546G -1 yes yes 125.1G ( 23%) 23.96G ( 4%)
> dx80_rg4_vol1 546G 1 yes yes 108.1G ( 20%) 33.84G ( 6%)
> dx80_rg13_vol1 546G 1 yes yes 109G ( 20%) 32.78G ( 6%)
> dx80_rg6_vol1 546G 1 yes yes 104.4G ( 19%) 35.61G ( 7%)
> dx80_rg3_vol1 546G 1 yes yes 105.6G ( 19%) 35.29G ( 6%)
> ------------- -------------------- -------------------
> (pool total) 2.666T 552.1G ( 20%) 161.5G ( 6%)
> -------------------------
Bear in mind these are round numbers. You cannot add the two percentages
together and get a completely accurate picture. Stands to reason if you
think about it.
[SNIP]
>
> /* next threshold: some free space, move middle-aged files */
> RULE 'move files that have not been changed in 7 days from the system pool to dx80_medium' MIGRATE FROM POOL 'system'
> TO POOL 'dx80_medium'
> THRESHOLD(75,65)
> LIMIT(95)
> WEIGHT(KB_ALLOCATED)
> WHERE (DAYS(CURRENT_TIMESTAMP) - DAYS(CHANGE_TIME) > 7 )
> AND KB_ALLOCATED >= 1024
> -------------------------
>
>
> As I understand it, none of those rules should trigger a lowDiskSpace event
> when the pool is 74% full, as it is now.
I would say 74% and 75% are very close and you are not taking into
account that the 20% and 6% are rounded values and adding them together
gives a result that is sufficiently slightly wrong to trigger the
lowDiskSpace event.
> Is the threshold in a file migration policy based on the %free (or used) in
> full blocks only, or in the sum of full blocks plus fragments?
What does mmdf without a --blocksize option, or with --blocksize 1K look
like, and what does doing the accurate maths then reveal?
My guess is you are that tiny bit fuller than you thing due to rounding
errors, then you are getting hit with the lets call the callback every
two minutes till it clears.
JAB.
--
Jonathan A. Buzzard Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.
More information about the gpfsug-discuss
mailing list