[gpfsug-discuss] Hanging file-systems

Sven Oehme oehmes at gmail.com
Tue Nov 27 20:43:04 GMT 2018


was the node you rebooted a client or a server that was running kswapd at
100% ?

sven


On Tue, Nov 27, 2018 at 12:09 PM Simon Thompson <S.J.Thompson at bham.ac.uk>
wrote:

> The nsd nodes were running 5.0.1-2 (though we just now rolling to 5.0.2-1
> I think).
>
>
>
> So is this memory pressure on the NSD nodes then? I thought it was
> documented somewhere that GFPS won’t use more than 50% of the host memory.
>
>
>
> And actually if you look at the values for maxStatCache and
> maxFilesToCache, the memory footprint is quite small.
>
>
>
> Sure on these NSD servers we had a pretty big pagepool (which we’ve
> dropped by some), but there still should have been quite a lot of memory
> space on the nodes …
>
>
>
> If only someone as going to do a talk in December at the CIUK SSUG on
> memory usage …
>
>
>
> Simon
>
>
>
> *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of "
> oehmes at gmail.com" <oehmes at gmail.com>
> *Reply-To: *"gpfsug-discuss at spectrumscale.org" <
> gpfsug-discuss at spectrumscale.org>
> *Date: *Tuesday, 27 November 2018 at 18:19
>
>
> *To: *"gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org
> >
> *Subject: *Re: [gpfsug-discuss] Hanging file-systems
>
>
>
> Hi,
>
>
>
> now i need to swap back in a lot of information about GPFS i tried to swap
> out :-)
>
>
>
> i bet kswapd is not doing anything you think the name suggest here, which
> is handling swap space.  i claim the kswapd thread is trying to throw
> dentries out of the cache and what it tries to actually get rid of are
> entries of directories very high up in the tree which GPFS still has a
> refcount on so it can't free it. when it does this there is a single thread
> (unfortunate was never implemented with multiple threads) walking down the
> tree to find some entries to steal, it it can't find any it goes to the
> next , next , etc and on a bus system it can take forever to free anything
> up. there have been multiple fixes in this area in 5.0.1.x and 5.0.2 which
> i pushed for the weeks before i left IBM. you never see this in a trace
> with default traces which is why nobody would have ever suspected this, you
> need to set special trace levels to even see this.
>
> i don't know the exact version the changes went into, but somewhere in the
> 5.0.1.X timeframe. the change was separating the cache list to prefer
> stealing files before directories, also keep a minimum percentages of
> directories in the cache (10 % by default) before it would ever try to get
> rid of a directory. it also tries to keep a list of free entries all the
> time (means pro active cleaning them) and also allows to go over the hard
> limit compared to just block as in previous versions. so i assume you run a
> version prior to 5.0.1.x and what you see is kspwapd desperately get rid of
> entries, but can't find one its already at the limit so it blocks and
> doesn't allow a new entry to be created or promoted from the statcache .
>
>
>
> again all this is without source code access and speculation on my part
> based on experience :-)
>
>
>
> what version are you running and also share mmdiag --stats of that node
>
>
>
> sven
>
>
>
>
>
>
>
>
>
>
>
>
>
> On Tue, Nov 27, 2018 at 9:54 AM Simon Thompson <S.J.Thompson at bham.ac.uk>
> wrote:
>
> Thanks Sven …
>
>
>
> We found a node with kswapd running 100% (and swap was off)…
>
>
>
> Killing that node made access to the FS spring into life.
>
>
>
> Simon
>
>
>
> *From: *<gpfsug-discuss-bounces at spectrumscale.org> on behalf of "
> oehmes at gmail.com" <oehmes at gmail.com>
> *Reply-To: *"gpfsug-discuss at spectrumscale.org" <
> gpfsug-discuss at spectrumscale.org>
> *Date: *Tuesday, 27 November 2018 at 16:14
> *To: *"gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org
> >
> *Subject: *Re: [gpfsug-discuss] Hanging file-systems
>
>
>
> 1. are you under memory pressure or even worse started swapping .
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20181127/ae3d32c5/attachment.htm>


More information about the gpfsug-discuss mailing list