[gpfsug-discuss] pagepool shrink doesn't release all memory

Fri Feb 23 03:24:00 GMT 2018

This is also interesting (although I don't know what it really means). 
Looking at pmap run against mmfsd I can see what happens after each step:

# baseline
00007fffe4639000  59164K      0K      0K      0K      0K ---p [anon]
00007fffd837e000  61960K      0K      0K      0K      0K ---p [anon]
0000020000000000 1048576K 1048576K 1048576K 1048576K      0K rwxp [anon]
Total:           1613580K 1191020K 1189650K 1171836K      0K

# tschpool 64G
00007fffe4639000  59164K      0K      0K      0K      0K ---p [anon]
00007fffd837e000  61960K      0K      0K      0K      0K ---p [anon]
0000020000000000 67108864K 67108864K 67108864K 67108864K      0K rwxp [anon]
Total:           67706636K 67284108K 67282625K 67264920K      0K

# tschpool 1G
00007fffe4639000  59164K      0K      0K      0K      0K ---p [anon]
00007fffd837e000  61960K      0K      0K      0K      0K ---p [anon]
0000020001400000 139264K 139264K 139264K 139264K      0K rwxp [anon]
0000020fc9400000 897024K 897024K 897024K 897024K      0K rwxp [anon]
0000020009c00000 66052096K      0K      0K      0K      0K rwxp [anon]
Total:           67706636K 1223820K 1222451K 1204632K      0K

Even though mmfsd has that 64G chunk allocated there's none of it 
*used*. I wonder why Linux seems to be accounting it as allocated.

-Aaron

On 2/22/18 10:17 PM, Aaron Knister wrote:
> I've been exploring the idea for a while of writing a SLURM SPANK plugin 
> to allow users to dynamically change the pagepool size on a node. Every 
> now and then we have some users who would benefit significantly from a 
> much larger pagepool on compute nodes but by default keep it on the 
> smaller side to make as much physmem available as possible to batch work.
> 
> In testing, though, it seems as though reducing the pagepool doesn't 
> quite release all of the memory. I don't really understand it because 
> I've never before seen memory that was previously resident become 
> un-resident but still maintain the virtual memory allocation.
> 
> Here's what I mean. Let's take a node with 128G and a 1G pagepool.
> 
> If I do the following to simulate what might happen as various jobs 
> tweak the pagepool:
> 
> - tschpool 64G
> - tschpool 1G
> - tschpool 32G
> - tschpool 1G
> - tschpool 32G
> 
> I end up with this:
> 
> mmfsd thinks there's 32G resident but 64G virt
> # ps -o vsz,rss,comm -p 24397
>     VSZ   RSS COMMAND
> 67589400 33723236 mmfsd
> 
> however, linux thinks there's ~100G used
> 
> # free -g
>               total       used       free     shared    buffers     cached
> Mem:           125        100         25          0          0          0
> -/+ buffers/cache:         98         26
> Swap:            7          0          7
> 
> I can jump back and forth between 1G and 32G *after* allocating 64G 
> pagepool and the overall amount of memory in use doesn't balloon but I 
> can't seem to shed that original 64G.
> 
> I don't understand what's going on... :) Any ideas? This is with Scale 
> 4.2.3.6.
> 
> -Aaron
> 

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776