[gpfsug-discuss] Well, this is the pits...

Thu May 4 17:11:53 BST 2017

Kevin,

The math currently used in the code appears to be "greater than 31 NSD's in
the filesystem" combined with "greater than 31 pit worker threads",
explicitly for a balancing restripe (we actually hit that combo on an older
version of 3.5.x before the safety got written in there...it was a long
day).  At least, that's the apparent math used through 4.1.1.10, which
we're currently running.

If pitWorkerThreadsPerNode is set to 0 (default), GPFS should set the
active thread number equal to the number of cores in the node, to a max of
16 threads I believe.  Take in mind that for a restripe, it will also
include the threads available on the fs manager.

So if your fs manager and at least one helper node are both set to "0", and
each contains at least 16 cores, the restripe "thread calculation" will
exceed 31 threads so it won't run.  We've had to tune our helper nodes to
lower numbers (e.g a single helper node to 15 threads).

Aaron please correct me if I'm braining that wrong anywhere.

-Jordan

On Thu, May 4, 2017 at 12:07 PM, Buterbaugh, Kevin L <
Kevin.Buterbaugh at vanderbilt.edu> wrote:

> Hi Olaf,
>
> I didn’t touch pitWorkerThreadsPerNode … it was already zero.
>
> I’m running 4.2.2.3 on my GPFS servers (some clients are on 4.2.1.1 or
> 4.2.0.3 and are gradually being upgraded).  What version of GPFS fixes
> this?  With what I’m doing I need the ability to run mmrestripefs.
>
> It seems to me that mmrestripefs could check whether QOS is enabled …
> granted, it would have no way of knowing whether the values used actually
> are reasonable or not … but if QOS is enabled then “trust” it to not
> overrun the system.
>
> PMR time?  Thanks..
>
> Kevin
>
> On May 4, 2017, at 10:54 AM, Olaf Weiser <olaf.weiser at de.ibm.com> wrote:
>
> HI Kevin,
> the number of NSDs is more or less nonsense .. it is just the number of
> nodes x PITWorker  should not exceed to much the #mutex/FS block
> did you adjust/tune the PitWorker ? ...
>
> so far as I know.. that the code checks the number of NSDs is already
> considered as a defect and will be fixed / is already fixed ( I stepped
> into it here as well)
>
> ps. QOS is the better approach to address this, but unfortunately.. not
> everyone is using it by default... that's why I suspect , the development
> decide to put in a check/limit here .. which in your case(with QOS)
>  would'nt needed
>
>
>
>
>
> From:        "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
> To:        gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date:        05/04/2017 05:44 PM
> Subject:        Re: [gpfsug-discuss] Well, this is the pits...
> Sent by:        gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
> Hi Olaf,
>
> Your explanation mostly makes sense, but...
>
> Failed with 4 nodes … failed with 2 nodes … not gonna try with 1 node.
> And this filesystem only has 32 disks, which I would imagine is not an
> especially large number compared to what some people reading this e-mail
> have in their filesystems.
>
> I thought that QOS (which I’m using) was what would keep an mmrestripefs
> from overrunning the system … QOS has worked extremely well for us - it’s
> one of my favorite additions to GPFS.
>
> Kevin
>
> On May 4, 2017, at 10:34 AM, Olaf Weiser <*olaf.weiser at de.ibm.com*
> <olaf.weiser at de.ibm.com>> wrote:
>
> no.. it is just in the code, because we have to avoid to run out of mutexs
> / block
>
> reduce the number of nodes -N down to 4  (2nodes is even more safer) ...
> is the easiest way to solve it for now....
>
> I've been told the real root cause will be fixed in one of the next ptfs
> .. within this year ..
> this warning messages itself should appear every time.. but unfortunately
> someone coded, that it depends on the number of disks (NSDs).. that's why I
> suspect you did'nt see it before
> but the fact , that we have to make sure, not to overrun the system by
> mmrestripe  remains.. to please lower the -N number of nodes to 4 or better
> 2
>
> (even though we know.. than the mmrestripe will take longer)
>
>
> From:        "Buterbaugh, Kevin L" <*Kevin.Buterbaugh at Vanderbilt.Edu*
> <Kevin.Buterbaugh at Vanderbilt.Edu>>
> To:        gpfsug main discussion list <*gpfsug-discuss at spectrumscale.org*
> <gpfsug-discuss at spectrumscale.org>>
> Date:        05/04/2017 05:26 PM
> Subject:        [gpfsug-discuss] Well, this is the pits...
> Sent by:        *gpfsug-discuss-bounces at spectrumscale.org*
> <gpfsug-discuss-bounces at spectrumscale.org>
> ------------------------------
>
>
>
> Hi All,
>
> Another one of those, “I can open a PMR if I need to” type questions…
>
> We are in the process of combining two large GPFS filesystems into one new
> filesystem (for various reasons I won’t get into here).  Therefore, I’m
> doing a lot of mmrestripe’s, mmdeldisk’s, and mmadddisk’s.
>
> Yesterday I did an “mmrestripefs <old fs> -r -N <my 8 NSD servers>” (after
> suspending a disk, of course).  Worked like it should.
>
> Today I did a “mmrestripefs <new fs> -b -P capacity -N <those same 8 NSD
> servers>” and got:
>
> mmrestripefs: The total number of PIT worker threads of all participating
> nodes has been exceeded to safely restripe the file system.  The total
> number of PIT worker threads, which is the sum of pitWorkerThreadsPerNode
> of the participating nodes, cannot exceed 31.  Reissue the command with a
> smaller set of participating nodes (-N option) and/or lower the
> pitWorkerThreadsPerNode configure setting.  By default the file system
> manager node is counted as a participating node.
> mmrestripefs: Command failed. Examine previous error messages to determine
> cause.
>
> So there must be some difference in how the “-r” and “-b” options
> calculate the number of PIT worker threads.  I did an “mmfsadm dump all |
> grep pitWorkerThreadsPerNode” on all 8 NSD servers and the filesystem
> manager node … they all say the same thing:
>
>   pitWorkerThreadsPerNode 0
>
> Hmmm, so 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 + 0 > 31?!?  I’m confused...
>
> —
> Kevin Buterbaugh - Senior System Administrator
> Vanderbilt University - Advanced Computing Center for Research and
> Education
> *Kevin.Buterbaugh at vanderbilt.edu* <Kevin.Buterbaugh at vanderbilt.edu>-
> (615)875-9633 <(615)%20875-9633>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at *spectrumscale.org* <http://spectrumscale.org/>
> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
> <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at *spectrumscale.org* <http://spectrumscale.org/>
> *http://gpfsug.org/mailman/listinfo/gpfsug-discuss*
> <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170504/284508da/attachment.htm>