[gpfsug-discuss] Executing Callbacks on other Nodes

Sven Oehme oehmes at gmail.com
Fri Apr 15 17:12:26 BST 2016


If you can wait a few more month we will have stats for this in Zimon.

Sven
On Apr 15, 2016 12:02 PM, "Oesterlin, Robert" <Robert.Oesterlin at nuance.com>
wrote:

> This command is just using ssh to all the nodes and dumping the waiter
> information and collecting it. That means if the node is down, slow to
> respond, or there are a large number of nodes, it could take a while to
> return.  In my 400-500 node clusters this command usually take less than 10
> seconds. I do prefix the command with a timeout value in case a node is
> hung up and ssh never returns (which it sometimes does, and that’s not the
> fault of GPFS) Something like this:
>
> timeout 45s /usr/lpp/mmfs/bin/mmlsnode -N waiters –L
>
> This means I get incomplete information, but if you don’t you end up
> piling up a lot of hung up commands. I would check over your cluster
> carefully to see if there are other issues that might cause ssh to hang up
> – which could impact other GPFS commands that distribute via ssh.
>
> Another approach would be to dump the waiters locally on each node, send
> node specific information to the database, and then sum it up using the
> graphing software.
>
> Bob Oesterlin
> Sr Storage Engineer, Nuance HPC Grid
>
> From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of Roland
> Pabel <dr.roland.pabel at gmail.com>
> Organization: RRZK Uni Köln
> Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: Friday, April 15, 2016 at 10:50 AM
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Subject: Re: [gpfsug-discuss] Executing Callbacks on other Nodes
>
> Hi,
>
> In our cluster, mmlsnode –N waiters –L takes about 25 seconds to run. So
> running it every 30 seconds is a bit close. I'll try running it once a
> minute
> and then incorporating this into our graphing.
>
> Maybe the command is so slow for me because a few nodes are down?
> Is there a parameter to mmlsnode to configure the timeout?
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160415/4cd24c9f/attachment.htm>


More information about the gpfsug-discuss mailing list