[gpfsug-discuss] Gpfs Memory Usaage Keeps going up and we don't know why.

Adam Huffman adam.huffman at crick.ac.uk
Mon Jul 24 15:40:51 BST 2017


smem is recommended here

Cheers,
Adam

--

Adam Huffman
Senior HPC and Cloud Systems Engineer
The Francis Crick Institute
1 Midland Road
London NW1 1AT

T: 020 3796 1175
E: adam.huffman at crick.ac.uk<mailto:adam.huffman at crick.ac.uk>
W: www.crick.ac.uk<http://www.crick.ac.uk>





On 24 Jul 2017, at 15:21, Peter Childs <p.childs at qmul.ac.uk<mailto:p.childs at qmul.ac.uk>> wrote:


top

but ps gives the same value.

[root at dn29<mailto:root at dn29> ~]# ps auww -q 4444
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root      4444  2.7 22.3 10537600 5472580 ?    S<Ll Jul12 466:13 /usr/lpp/mmfs/bin/mmfsd

Thanks for the help

Peter.


On Mon, 2017-07-24 at 14:10 +0000, Jim Doherty wrote:
How are you identifying  the high memory usage?


On Monday, July 24, 2017 9:30 AM, Peter Childs <p.childs at qmul.ac.uk<mailto:p.childs at qmul.ac.uk>> wrote:


I've had a look at mmfsadm dump malloc and it looks to agree with the output from mmdiag --memory. and does not seam to account for the excessive memory usage.

The new machines do have idleSocketTimout set to 0 from what your saying it could be related to keeping that many connections between nodes working.

Thanks in advance

Peter.




[root at dn29<mailto:root at dn29> ~]# mmdiag --memory

=== mmdiag: memory ===
mmfsd heap size: 2039808 bytes


Statistics for MemoryPool id 1 ("Shared Segment (EPHEMERAL)")
           128 bytes in use
   17500049370 hard limit on memory usage
       1048576 bytes committed to regions
             1 number of regions
           555 allocations
           555 frees
             0 allocation failures


Statistics for MemoryPool id 2 ("Shared Segment")
      42179592 bytes in use
   17500049370 hard limit on memory usage
      56623104 bytes committed to regions
             9 number of regions
        100027 allocations
         79624 frees
             0 allocation failures


Statistics for MemoryPool id 3 ("Token Manager")
       2099520 bytes in use
   17500049370 hard limit on memory usage
      16778240 bytes committed to regions
             1 number of regions
             4 allocations
             0 frees
             0 allocation failures


On Mon, 2017-07-24 at 13:11 +0000, Jim Doherty wrote:
There are 3 places that the GPFS mmfsd uses memory  the pagepool  plus 2 shared memory segments.   To see the memory utilization of the shared memory segments run the command   mmfsadm dump malloc .    The statistics for memory pool id 2 is where  maxFilesToCache/maxStatCache objects are  and the manager nodes use memory pool id 3 to track the MFTC/MSC objects.

You might want to upgrade to later PTF  as there was a PTF to fix a memory leak that occurred in tscomm associated with network connection drops.


On Monday, July 24, 2017 5:29 AM, Peter Childs <p.childs at qmul.ac.uk<mailto:p.childs at qmul.ac.uk>> wrote:


We have two GPFS clusters.

One is fairly old and running 4.2.1-2 and non CCR and the nodes run
fine using up about 1.5G of memory and is consistent (GPFS pagepool is
set to 1G, so that looks about right.)

The other one is "newer" running 4.2.1-3 with CCR and the nodes keep
increasing in there memory usage, starting at about 1.1G and are find
for a few days however after a while they grow to 4.2G which when the
node need to run real work, means the work can't be done.

I'm losing track of what maybe different other than CCR, and I'm trying
to find some more ideas of where to look.

I'm checked all the standard things like pagepool and maxFilesToCache
(set to the default of 4000), workerThreads is set to 128 on the new
gpfs cluster (against default 48 on the old)

I'm not sure what else to look at on this one hence why I'm asking the
community.

Thanks in advance

Peter Childs
ITS Research Storage
Queen Mary University of London.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--

Peter Childs
ITS Research Storage
Queen Mary, University of London

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


--

Peter Childs
ITS Research Storage
Queen Mary, University of London

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss


The Francis Crick Institute Limited is a registered charity in England and Wales no. 1140062 and a company registered in England and Wales no. 06885462, with its registered office at 1 Midland Road London NW1 1AT
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170724/f9b734fa/attachment.htm>


More information about the gpfsug-discuss mailing list