[gpfsug-discuss] Looking for a way to see which node is having an impact on server?

Vic Cornell viccornell at gmail.com
Tue Dec 10 10:13:20 GMT 2013


Have you looked at mmpmon? Its a bit much for 600 nodes but if you run it with a reasonable interface specified then the output shouldn't be too hard to parse.

Quick recipe:

create a file called mmpmon.conf that looks like 


################# cut here #########################
nlist add node1 node2 node3 node4 node5
io_s
reset
################# cut here #########################

Where node1,node2 etc are your node names - it might be as well to do this for batches of 50 or so.

then run something like:

/usr/lpp/mmfs/bin/mmpmon -i mmpmon.conf -d 10000 -r 0 -p

That will give you a set of stats for all of your named nodes aggregated over a 10 second period

Dont run more than one of these as each one will reset the stats for the other :-)


parse out the stats with something like:

awk -F_ '{if ($2=="io"){print $8,$16/1024/1024,$18/1024/1024}}'

which will give you read and write throughput.

The docs (GPFS advanced Administration Guide) are reasonable.

Cheers,

Vic Cornell
viccornell at gmail.com


On 9 Dec 2013, at 19:52, Alex Chekholko <chekh at stanford.edu> wrote:

> Hi Richard,
> 
> I would just use something like 'iftop' to look at the traffic between the nodes.  Or 'collectl'.  Or 'dstat'.
> 
> e.g. dstat -N eth0 --gpfs --gpfs-ops --top-cpu-adv --top-io 2 10
> http://dag.wiee.rs/home-made/dstat/
> 
> For the NSD balance question, since GPFS stripes the blocks evenly across all the NSDs, they will end up balanced over time.  Or you can rebalance manually with 'mmrestripefs -b' or similar.
> 
> It is unlikely that particular files ended up on a single NSD, unless the other NSDs are totally full.
> 
> Regards,
> Alex
> 
> On 12/06/2013 04:31 PM, Richard Lefebvre wrote:
>> Hi,
>> 
>> I'm looking for a way to see which node (or nodes) is having an impact
>> on the gpfs server nodes which is slowing the whole file system? What
>> happens, usually, is a user is doing some I/O that doesn't fit the
>> configuration of the gpfs file system and the way it was explain on how
>> to use it efficiently.  It is usually by doing a lot of unbuffered byte
>> size, very random I/O on the file system that was made for large files
>> and large block size.
>> 
>> My problem is finding out who is doing that. I haven't found a way to
>> pinpoint the node or nodes that could be the source of the problem, with
>> over 600 client nodes.
>> 
>> I tried to use "mmlsnodes -N waiters -L" but there is too much waiting
>> that I cannot pinpoint on something.
>> 
>> I must be missing something simple. Anyone got any help?
>> 
>> Note: there is another thing I'm trying to pinpoint. A temporary
>> imbalance was created by adding a new NSD. It seems that a group of
>> files have been created on that same NSD and a user keeps hitting that
>> NSD causing a high load.  I'm trying to pinpoint the origin of that too.
>> At least until everything is balance back. But will balancing spread
>> those files since they are already on the most empty NSD?
>> 
>> Richard
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> 
> 
> -- 
> Alex Chekholko chekh at stanford.edu
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss




More information about the gpfsug-discuss mailing list