[gpfsug-discuss] frequent OOM killer due to high memory usage of mmfsd
Achim Rehor
Achim.Rehor at de.ibm.com
Thu Sep 7 09:13:16 BST 2023
Thanks Stephan,
worthful hint!
@Christoph: the memory footprint of mmfsd can get quite large on a CES
system, given the usual tuning recommendations for CES nodes.
mmfsd is using the pagepool (so 16GB in your case) plus the caches for
the management of maxFilesToCache and maxStatCache outside the
pagepool, which is ~10kB per FileToCache entry, and~0.5kB per StatCache
entry.
That sums up to like 40GB for 4000000mStatCache entries and 2GB for the
same sized StatCache ..
In addition if your node is a manager node it will also use some space
for TokenMem... (depending on your full clusters mFtC settings/ the
number of manager nodes ..etc)
So 256GB should be largely sufficient forthe named scenario ..
If themmfsd mem footprint is constantly growing, i'd recommend opening
a ticket and uploading a snap, so support can have a more detailed look
--
Mit freundlichen Grüßen / Kind regards
Achim Rehor
Technical Support Specialist Spectrum Scale and ESS (SME)
Advisory Product Services Professional
IBM Systems Storage Support - EMEA
Achim.Rehor at de.ibm.com +49-170-
4521194
IBM Deutschland GmbH
Vorsitzender des Aufsichtsrats: Sebastian Krause
Geschäftsführung: Gregor Pillen (Vorsitzender), Nicole Reimer,
Gabriele Schwarenthorer, Christine Rupp, Frank Theisen
Sitz der Gesellschaft: Ehningen / Registergericht:
AmtsgerichtStuttgart, HRB 14562 / WEEE-Reg.-Nr. DE 99369940
-----Original Message-----
From: Stephan Graf <st.graf at fz-juelich.de>
Reply-To: gpfsug main discussion list <gpfsug-discuss at gpfsug.org>
To: gpfsug-discuss at gpfsug.org
Subject: [EXTERNAL] Re: [gpfsug-discuss] frequent OOM killer due to
high memory usage of mmfsd
Date: Thu, 07 Sep 2023 08:50:14 +0200
Hi
in the past we had issues with the mmdf heap memory. Due to special
workload it increased and took GB of memory, but after usage it was not
freed again.
we had long discussions with IBM about it and it ends up in a
Development User Story (261213) which was realized in 5.1.2:
---
In this story, the InodeAllocSegment object will be allocated when
accessed.
For commands that iterates all InodeAllocSegment, we will release the
object
immediately after use.
An undocumented configuration "!maxIAllocSegmentsToCache" is provided
to
control
the upper limit of the count of InodeAllocSegment objects. When the
count
approaches the limit, a pre stealing thread will be started to steal
and
release some InodeAllocSegment objects. Its default value is 1000,000.
---
since than we are fine so far. But this was on plain GPFS clients, no
CES node where the service like NFS comes into play.
You can monitor the heap memory usage by using "mmdiag --memory"
@IBM colleagues: If there is something wrong in my explanation please
correct me.
Stephan
On 9/6/23 20:55, Christoph Martin wrote:
> Hi all,
>
> on a three node GPFS cluster with CES enabled and AFM-DR mirroring to
> a
> second cluster we see frequent OOM killer events due to a constantly
> growing mmfsd.
> The machines have 256G memory. The pagepool is configured to 16G.
> The GPFS version is 5.1.6-1.
> After a restart mmfsd rapidly grows to about 100G usage and grows
> over
> some days up to 250G virtual and 220G physical memory usage.
> OOMkiller tries kill process like pmcollector or others and sometime
> kills mmfsd.
>
> Does anybody see a similar behavior?
> Any guess what could help with this problem?
>
> Regards
> Christoph Martin
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at gpfsug.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss_gpfsug.org
More information about the gpfsug-discuss
mailing list