[gpfsug-discuss] mmfsadm test pit

Marc A Kaplan makaplan at us.ibm.com
Tue Aug 16 22:09:35 BST 2016


I was surprised to read that Ctrl-C did not really kill restripe.   It's 
supposed to!  If it doesn't that's a bug. 

I ran this by my expert within IBM and he wrote to me:

First of all a "PIT job" such as restripe, deldisk, delsnapshot, and such 
should be easy to stop by ^C the management program that started them. The 
SG manager daemon holds open a socket to the client program for the 
purposes of sending command output, progress updates, error messages and 
the like.  The PIT code checks this socket periodically and aborts the PIT 
process cleanly if the socket is closed.  If this cleanup doesn't occur, 
it is a bug and should be worth reporting.  However, there's no exact 
guarantee on how quickly each thread on the SG mgr will notice and then 
how quickly the helper nodes can be stopped and so forth.  The interval 
between socket checks depends among other things on how long it takes to 
process each file, if there are a few very large files, the delay can be 
significant.  In the limiting case, where most of the FS storage is 
contained in a few files, this mechanism doesn't work [elided] well.  So 
it can be quite involved and slow sometimes to wrap up a PIT operation.

The simplest way to determine if the command has really stopped is with 
the mmdiag --commands issued on the SG manager node.  This shows running 
commands with the command line, start time, socket, flags, etc.  After 
^Cing the client program, the entry here should linger for a while, then 
go away.  When it exits you'll see an entry in the GPFS log file where it 
fails with err 50.  If this doesn't stop the command after a while, it is 
worth looking into.

If the command wasn't issued on the SG mgr node and you can't find the 
where the client command is running, the socket is still a useful hint. 
While tedious, it should be possible to trace this socket back to node 
where that command was originally run using netstat or equivalent.  Poking 
around inside a GPFS internaldump will also provide clues; there should be 
an outstanding  sgmMsgSGClientCmd command listed in the dump tscomm 
section.  Once you find it, just 'kill `pidof mmrestripefs` or similar.

I'd like to warn the OP away from mmfsadm test pit.  These commands are of 
course unsupported and unrecommended for any purpose (even internal test 
and development purposes, as far as I know).  You are definitely working 
without a net there.  When I was improving the integration between PIT and 
snapshot quiesce a few years ago, I looked into this and couldn't figure 
out how to (easily) make these stop and resume commands safe to use, so as 
far as I know they remain unsafe.  The list command, however, is probably 
fairly okay; but it would probably be better to use mmfsadm saferdump pit.





From:   Aaron Knister <aaron.s.knister at nasa.gov>
To:     <gpfsug-discuss at spectrumscale.org>
Date:   08/15/2016 10:49 PM
Subject:        [gpfsug-discuss] mmfsadm test pit
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



I just discovered this interesting gem poking at mmfsadm:

  test pit fsname list|suspend|status|resume|stop [jobId]

There have been times where I've kicked off a restripe and either 
intentionally or accidentally ctrl-c'd it only to realize that many 
times it's disappeared into the ether and is still running. The only way 
I've known so far to stop it is with a chgmgr.

A far more painful instance happened when I ran a rebalance on an fs 
w/more than 31 nsds using more than 31 pit workers and hit *that* fun 
APAR which locked up access for a single filesystem to all 3.5k nodes. 
We spent 48 hours round the clock rebooting nodes as jobs drained to 
clear it up. I would have killed in that instance for a way to cancel 
the PIT job (the chmgr trick didn't work). It looks like you might 
actually be able to do this with mmfsadm, although how wise this is, I 
do not know (kinda curious about that).

Here's an example. I kicked off a restripe and then ctrl-c'd it on a 
client node. Then ran these commands from the fs manager:

root at loremds19:~ # /usr/lpp/mmfs/bin/mmfsadm test pit tlocal list
JobId 785979015170 PitJobStatus PIT_JOB_RUNNING progress 0.00
debug: statusListP D40E2C70

root at loremds19:~ # /usr/lpp/mmfs/bin/mmfsadm test pit tlocal stop 
785979015170
debug: statusListP 0

root at loremds19:~ # /usr/lpp/mmfs/bin/mmfsadm test pit tlocal list
JobId 785979015170 PitJobStatus PIT_JOB_STOPPING progress 4.01
debug: statusListP D4013E70

... some time passes ...

root at loremds19:~ # /usr/lpp/mmfs/bin/mmfsadm test pit tlocal list
debug: statusListP 0

Interesting.

-Aaron

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160816/adb5009c/attachment.htm>


More information about the gpfsug-discuss mailing list