[gpfsug-discuss] Hanging file-systems

Sven Oehme oehmes at gmail.com
Tue Nov 27 20:44:26 GMT 2018


and i already talk about NUMA stuff at the CIUK usergroup meeting, i won't
volunteer for a 2nd advanced topic  :-D


On Tue, Nov 27, 2018 at 12:43 PM Sven Oehme <oehmes at gmail.com> wrote:

> was the node you rebooted a client or a server that was running kswapd at
> 100% ?
>
> sven
>
>
> On Tue, Nov 27, 2018 at 12:09 PM Simon Thompson <S.J.Thompson at bham.ac.uk>
> wrote:
>
>> The nsd nodes were running 5.0.1-2 (though we just now rolling to 5.0.2-1
>> I think).
>>
>>
>>
>> So is this memory pressure on the NSD nodes then? I thought it was
>> documented somewhere that GFPS won’t use more than 50% of the host memory.
>>
>>
>>
>> And actually if you look at the values for maxStatCache and
>> maxFilesToCache, the memory footprint is quite small.
>>
>>
>>
>> Sure on these NSD servers we had a pretty big pagepool (which we’ve
>> dropped by some), but there still should have been quite a lot of memory
>> space on the nodes …
>>
>>
>>
>> If only someone as going to do a talk in December at the CIUK SSUG on
>> memory usage …
>>
>>
>>
>> Simon
>>
>>
>>
>> *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of "
>> oehmes at gmail.com" <oehmes at gmail.com>
>> *Reply-To: *"gpfsug-discuss at spectrumscale.org" <
>> gpfsug-discuss at spectrumscale.org>
>> *Date: *Tuesday, 27 November 2018 at 18:19
>>
>>
>> *To: *"gpfsug-discuss at spectrumscale.org" <
>> gpfsug-discuss at spectrumscale.org>
>> *Subject: *Re: [gpfsug-discuss] Hanging file-systems
>>
>>
>>
>> Hi,
>>
>>
>>
>> now i need to swap back in a lot of information about GPFS i tried to
>> swap out :-)
>>
>>
>>
>> i bet kswapd is not doing anything you think the name suggest here, which
>> is handling swap space.  i claim the kswapd thread is trying to throw
>> dentries out of the cache and what it tries to actually get rid of are
>> entries of directories very high up in the tree which GPFS still has a
>> refcount on so it can't free it. when it does this there is a single thread
>> (unfortunate was never implemented with multiple threads) walking down the
>> tree to find some entries to steal, it it can't find any it goes to the
>> next , next , etc and on a bus system it can take forever to free anything
>> up. there have been multiple fixes in this area in 5.0.1.x and 5.0.2 which
>> i pushed for the weeks before i left IBM. you never see this in a trace
>> with default traces which is why nobody would have ever suspected this, you
>> need to set special trace levels to even see this.
>>
>> i don't know the exact version the changes went into, but somewhere in
>> the 5.0.1.X timeframe. the change was separating the cache list to prefer
>> stealing files before directories, also keep a minimum percentages of
>> directories in the cache (10 % by default) before it would ever try to get
>> rid of a directory. it also tries to keep a list of free entries all the
>> time (means pro active cleaning them) and also allows to go over the hard
>> limit compared to just block as in previous versions. so i assume you run a
>> version prior to 5.0.1.x and what you see is kspwapd desperately get rid of
>> entries, but can't find one its already at the limit so it blocks and
>> doesn't allow a new entry to be created or promoted from the statcache .
>>
>>
>>
>> again all this is without source code access and speculation on my part
>> based on experience :-)
>>
>>
>>
>> what version are you running and also share mmdiag --stats of that node
>>
>>
>>
>> sven
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Tue, Nov 27, 2018 at 9:54 AM Simon Thompson <S.J.Thompson at bham.ac.uk>
>> wrote:
>>
>> Thanks Sven …
>>
>>
>>
>> We found a node with kswapd running 100% (and swap was off)…
>>
>>
>>
>> Killing that node made access to the FS spring into life.
>>
>>
>>
>> Simon
>>
>>
>>
>> *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of "
>> oehmes at gmail.com" <oehmes at gmail.com>
>> *Reply-To: *"gpfsug-discuss at spectrumscale.org" <
>> gpfsug-discuss at spectrumscale.org>
>> *Date: *Tuesday, 27 November 2018 at 16:14
>> *To: *"gpfsug-discuss at spectrumscale.org" <
>> gpfsug-discuss at spectrumscale.org>
>> *Subject: *Re: [gpfsug-discuss] Hanging file-systems
>>
>>
>>
>> 1. are you under memory pressure or even worse started swapping .
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20181127/79f7eadc/attachment.htm>


More information about the gpfsug-discuss mailing list