[gpfsug-discuss] forcibly panic stripegroup everywhere?

Sven Oehme oehmes at gmail.com
Mon Jan 23 05:27:53 GMT 2017


Aaron,

hold a bit with the upgrade , i just got word that while 4.2.1+ most likely
addresses the issues i mentioned, there was a defect in the initial release
of the parallel log recovery code. i will get the exact minimum version you
need to deploy and send another update to this thread.

sven

On Mon, Jan 23, 2017 at 5:03 AM Sven Oehme <oehmes at gmail.com> wrote:

> Then i would suggest to move up to at least 4.2.1.LATEST , there is a high
> chance your problem might already be fixed.
>
> i see 2 potential area that got significant improvements , Token Manager
> recovery and Log Recovery, both are in latest 4.2.1 code enabled :
>
> 2 significant improvements on Token Recovery in 4.2.1 :
>
>  1. Extendible hashing for token hash table.  This speeds up token lookup
> and thereby reduce tcMutex hold times for configurations with a large ratio
> of clients to token servers.
>   2. Cleaning up tokens held by failed nodes was making multiple passes
> over the whole token table, one for each failed node.  The loops are now
> inverted, so it makes a single pass over the able, and for each token
>  found, does cleanup for all failed nodes.
>
> there are multiple smaller enhancements beyond 4.2.1 but thats the minimum
> level you want to be. i have seen token recovery of 10's of minutes similar
> to what you described going down to a minute with this change.
>
> on Log Recovery -  in case of an unclean unmount/shutdown of a node prior
> 4.2.1 the Filesystem manager would only recover one Log file at a time,
> using a single thread, with 4.2.1 this is now done with multiple threads
> and multiple log files in parallel .
>
> Sven
>
> On Mon, Jan 23, 2017 at 4:22 AM Aaron Knister <aaron.s.knister at nasa.gov>
> wrote:
>
> It's at 4.1.1.10.
>
> On 1/22/17 11:12 PM, Sven Oehme wrote:
> > What version of Scale/ GPFS code is this cluster on ?
> >
> > ------------------------------------------
> > Sven Oehme
> > Scalable Storage Research
> > email: oehmes at us.ibm.com
> > Phone: +1 (408) 824-8904 <(408)%20824-8904>
> > IBM Almaden Research Lab
> > ------------------------------------------
> >
> > Inactive hide details for Aaron Knister ---01/23/2017 01:31:29 AM---I
> > was afraid someone would ask :) One possible use would beAaron Knister
> > ---01/23/2017 01:31:29 AM---I was afraid someone would ask :) One
> > possible use would be testing how monitoring reacts to and/or
> >
> > From: Aaron Knister <aaron.s.knister at nasa.gov>
> > To: <gpfsug-discuss at spectrumscale.org>
> > Date: 01/23/2017 01:31 AM
> > Subject: Re: [gpfsug-discuss] forcibly panic stripegroup everywhere?
> > Sent by: gpfsug-discuss-bounces at spectrumscale.org
> >
> > ------------------------------------------------------------------------
> >
> >
> >
> > I was afraid someone would ask :)
> >
> > One possible use would be testing how monitoring reacts to and/or
> > corrects stale filesystems.
> >
> > The use in my case is there's an issue we see quite often where a
> > filesystem won't unmount when trying to shut down gpfs. Linux insists
> > its still busy despite every process being killed on the node just about
> > except init. It's a real pain because it complicates maintenance,
> > requiring a reboot of some nodes prior to patching for example.
> >
> > I dug into it and it appears as though when this happens the
> > filesystem's mnt_count is ridiculously high (300,000+ in one case). I'm
> > trying to debug it further but I need to actually be able to make the
> > condition happen a few more times to debug it. A stripegroup panic isn't
> > a surefire way but it's the only way I've found so far to trigger this
> > behavior somewhat on demand.
> >
> > One way I've found to trigger a mass stripegroup panic is to induce what
> > I call a  "301 error":
> >
> > loremds07: Sun Jan 22 00:30:03.367 2017: [X] File System ttest unmounted
> > by the system with return code 301 reason code 0
> > loremds07: Sun Jan 22 00:30:03.368 2017: Invalid argument
> >
> > and tickle a known race condition between nodes being expelled from the
> > cluster and a manager node joining the cluster. When this happens it
> > seems to cause a mass stripe group panic that's over in a few minutes.
> > The trick there is that it doesn't happen every time I go through the
> > exercise and when it does there's no guarantee the filesystem that
> > panics is the one in use. If it's not an fs in use then it doesn't help
> > me reproduce the error condition. I was trying to use the "mmfsadm test
> > panic" command to try a more direct approach.
> >
> > Hope that helps shed some light.
> >
> > -Aaron
> >
> > On 1/22/17 8:16 PM, Andrew Beattie wrote:
> >> Out of curiosity -- why would you want to?
> >> Andrew Beattie
> >> Software Defined Storage  - IT Specialist
> >> Phone: 614-2133-7927
> >> E-mail: abeattie at au1.ibm.com <mailto:abeattie at au1.ibm.com>
> >>
> >>
> >>
> >>     ----- Original message -----
> >>     From: Aaron Knister <aaron.s.knister at nasa.gov>
> >>     Sent by: gpfsug-discuss-bounces at spectrumscale.org
> >>     To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> >>     Cc:
> >>     Subject: [gpfsug-discuss] forcibly panic stripegroup everywhere?
> >>     Date: Mon, Jan 23, 2017 11:11 AM
> >>
> >>     This is going to sound like a ridiculous request, but, is there a
> way to
> >>     cause a filesystem to panic everywhere in one "swell foop"? I'm
> assuming
> >>     the answer will come with an appropriate disclaimer of "don't ever
> do
> >>     this, we don't support it, it might eat your data, summon cthulu,
> etc.".
> >>     I swear I've seen the fs manager initiate this type of operation
> before.
> >>
> >>     I can seem to do it on a per-node basis with "mmfsadm test panic
> <fs>
> >>     <error code>" but if I do that over all 1k nodes in my test cluster
> at
> >>     once it results in about 45 minutes of almost total deadlock while
> each
> >>     panic is processed by the fs manager.
> >>
> >>     -Aaron
> >>
> >>     --
> >>     Aaron Knister
> >>     NASA Center for Climate Simulation (Code 606.2)
> >>     Goddard Space Flight Center
> >>     (301) 286-2776
> >>     _______________________________________________
> >>     gpfsug-discuss mailing list
> >>     gpfsug-discuss at spectrumscale.org
> >>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>
> >>
> >>
> >>
> >>
> >>
> >> _______________________________________________
> >> gpfsug-discuss mailing list
> >> gpfsug-discuss at spectrumscale.org
> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>
> >
> > --
> > Aaron Knister
> > NASA Center for Climate Simulation (Code 606.2)
> > Goddard Space Flight Center
> > (301) 286-2776
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170123/d3c45d88/attachment.htm>


More information about the gpfsug-discuss mailing list