[gpfsug-discuss] Executing Callbacks on other Nodes

Roland Pabel dr.roland.pabel at gmail.com
Mon Apr 18 16:10:02 BST 2016


Hi Bob,

I'll try the second approach, i.e, collecting "mmfsadm dump waiters" locally 
and then summing the values up, since it can be done without the overhead of 
ssh.

You mentioned mmlsnode starts all these ssh commands and that made me look 
into the file itself. I then noticed most of the mm commands are actually 
scripts. This helps a lot with regards to my original question. mmdsh seems to 
do what I need.

Thanks,

Roland


> This command is just using ssh to all the nodes and dumping the waiter
> information and collecting it. That means if the node is down, slow to
> respond, or there are a large number of nodes, it could take a while to
> return.  In my 400-500 node clusters this command usually take less than 10
> seconds. I do prefix the command with a timeout value in case a node is
> hung up and ssh never returns (which it sometimes does, and that’s not the
> fault of GPFS) Something like this:
 
> timeout 45s /usr/lpp/mmfs/bin/mmlsnode -N waiters –L
> 
> This means I get incomplete information, but if you don’t you end up piling
> up a lot of hung up commands. I would check over your cluster carefully to
> see if there are other issues that might cause ssh to hang up – which could
> impact other GPFS commands that distribute via ssh.
 
> Another approach would be to dump the waiters locally on each node, send
> node specific information to the database, and then sum it up using the
> graphing software.
 
> Bob Oesterlin
> Sr Storage Engineer, Nuance HPC Grid
> 
> From:
> <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at spe
> ctrumscale.org>> on behalf of Roland Pabel
> <dr.roland.pabel at gmail.com<mailto:dr.roland.pabel at gmail.com>>
> Organization: RRZK Uni Köln
> Reply-To: gpfsug main discussion list
> <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
> 
 Date: Friday, April 15, 2016 at 10:50 AM
> To: gpfsug main discussion list
> <gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.org>>
> 
 Subject: Re: [gpfsug-discuss] Executing Callbacks on other Nodes
> 
> Hi,
> 
> In our cluster, mmlsnode –N waiters –L takes about 25 seconds to run. So
> running it every 30 seconds is a bit close. I'll try running it once a
> minute
 and then incorporating this into our graphing.
> 
> Maybe the command is so slow for me because a few nodes are down?
> Is there a parameter to mmlsnode to configure the timeout?
> 
> 

-- 
Dr. Roland Pabel
Regionales Rechenzentrum der Universität zu Köln (RRZK)
Weyertal 121, Raum 3.07
D-50931 Köln

Tel.: +49 (221) 470-89589
E-Mail: pabel at uni-koeln.de



More information about the gpfsug-discuss mailing list