[gpfsug-discuss] gpfs performance monitoring

Fri Sep 5 22:17:47 BST 2014

On 9/5/14, 3:56 AM, Salvatore Di Nardo wrote:
> Little clarification:
> Our ls its plain ls, there is no alias.
...
> Last question about "maxFIlesToCache" you say that must be large on
> small cluster but small on large clusters. What do you consider 6
> servers and  almost 700 clients?
>
> on clienst we have:
>     maxFilesToCache 4000
>
> on servers we have
>    maxFilesToCache 12288
>
>

One thing to do is to try your 'ls', see it is slow, then immediately 
run it again.  If it is fast the second and consecutive times, it's 
because now the stat info is coming out of local cache.

e.g. /usr/bin/time ls /path/to/some/dir && /usr/bin/time ls 
/path/to/some/dir

The second time is likely to be almost immediate.  So long as your local 
cache is big enough.

I see on one of our older clusters we have:
tokenMemLimit 2G
maxFilesToCache 40000
maxStatCache 80000

You can also interrogate the local cache to see how full it is.

Of course, if many nodes are writing to same dirs, then the cache will 
need to be invalidated often which causes some overhead.  Big local 
cache is good if clients are usually working in different directories.

Regards,
-- 
chekh at stanford.edu