[gpfsug-discuss] Executing Callbacks on other Nodes

Oesterlin, Robert Robert.Oesterlin at nuance.com
Fri Apr 15 17:02:08 BST 2016


This command is just using ssh to all the nodes and dumping the waiter information and collecting it. That means if the node is down, slow to respond, or there are a large number of nodes, it could take a while to return.  In my 400-500 node clusters this command usually take less than 10 seconds. I do prefix the command with a timeout value in case a node is hung up and ssh never returns (which it sometimes does, and that’s not the fault of GPFS) Something like this:

timeout 45s /usr/lpp/mmfs/bin/mmlsnode -N waiters –L

This means I get incomplete information, but if you don’t you end up piling up a lot of hung up commands. I would check over your cluster carefully to see if there are other issues that might cause ssh to hang up – which could impact other GPFS commands that distribute via ssh.

Another approach would be to dump the waiters locally on each node, send node specific information to the database, and then sum it up using the graphing software.

Bob Oesterlin
Sr Storage Engineer, Nuance HPC Grid

From: <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spectrumscale.org>> on behalf of Roland Pabel <dr.roland.pabel at gmail.com<mailto:dr.roland.pabel at gmail.com>>
Organization: RRZK Uni Köln
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Date: Friday, April 15, 2016 at 10:50 AM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
Subject: Re: [gpfsug-discuss] Executing Callbacks on other Nodes

Hi,

In our cluster, mmlsnode –N waiters –L takes about 25 seconds to run. So
running it every 30 seconds is a bit close. I'll try running it once a minute
and then incorporating this into our graphing.

Maybe the command is so slow for me because a few nodes are down?
Is there a parameter to mmlsnode to configure the timeout?


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160415/eebac6ce/attachment.htm>


More information about the gpfsug-discuss mailing list