[gpfsug-discuss] High I/O wait times

Fri Jul 6 18:13:26 BST 2018

Hi Kevin,

This is a bit of a "cargo cult" suggestion but one issue that I have seen
is if a disk starts misbehaving a bit but does not fail, it slows down the
whole raid group that it is in.  And the only way to detect it is to
examine the read/write latencies on the individual disks.  Does your SAN
allow you to do that?

That happened to me at least twice in my life and replacing the offending
individual disk solved the issue.  This was on DDN, so the relevant command
were something like 'show pd * counters write_lat' or similar, which showed
the latency for the I/Os for each disk.  If one disk in the group is an
outlier (e.g. 1s write latencies), then the whole raid array (LUN) is just
waiting for that one disk.

Another possibility for troubleshooting, if you have sufficient free
resources: you can just suspend the problematic LUNs in GPFS, as that will
remove the write load from them, while still having them service read
requests and not affecting users.

Regards,
Alex

On Fri, Jul 6, 2018 at 9:11 AM Buterbaugh, Kevin L <
Kevin.Buterbaugh at vanderbilt.edu> wrote:

> Hi Jim,
>
> Thank you for your response.  We are taking a two-pronged approach at this
> point:
>
> 1.  While I don’t see anything wrong with our storage arrays, I have
> opened a ticket with the vendor (not IBM) to get them to look at things
> from that angle.
>
> 2.  Since the problem moves around from time to time, we are enhancing our
> monitoring script to see if we can basically go from “mmdiag —iohist” to
> “clients issuing those I/O requests” to “jobs running on those clients” to
> see if there is any commonality there.
>
> Thanks again - much appreciated!
>
> —
> Kevin Buterbaugh - Senior System Administrator
> Vanderbilt University - Advanced Computing Center for Research and
> Education
> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633
>
>
> On Jul 6, 2018, at 8:02 AM, Jim Doherty <jjdoherty at yahoo.com> wrote:
>
> You may want to get an mmtrace,  but I suspect that the disk IOs are
> slow.     The iohist is showing the time from when the start IO was issued
> until it was finished.    Of course if you have disk IOs taking 10x too
> long then other IOs are going to queue up behind it.    If there are more
> IOs than there are NSD server threads then there are going to be IOs that
> are queued and waiting for a thread.
>
> Jim
>
>
> On Thursday, July 5, 2018, 9:30:30 PM EDT, Buterbaugh, Kevin L <
> Kevin.Buterbaugh at Vanderbilt.Edu> wrote:
>
>
> Hi All,
>
> First off, my apologies for the delay in responding back to the list …
> we’ve actually been working our tails off on this one trying to collect as
> much data as we can on what is a very weird issue.  While I’m responding to
> Aaron’s e-mail, I’m going to try to address the questions raised in all the
> responses.
>
> Steve - this all started last week.  You’re correct about our mixed
> workload.  There have been no new workloads that I am aware of.
>
> Stephen - no, this is not an ESS.  We are running GPFS 4.2.3-8.
>
> Aaron - no, this is not on a DDN, either.
>
> The hardware setup is a vanilla 8 GB FC SAN.  Commodity hardware for the
> servers and storage.  We have two SAN “stacks” and all NSD servers and
> storage are connected to both stacks.  Linux multipathing handles path
> failures.  10 GbE out to the network.
>
> We first were alerted to this problem by one of our monitoring scripts
> which was designed to alert us to abnormally high I/O times, which, as I
> mentioned previously, in our environment has usually been caused by cache
> battery backup failures in the storage array controllers (but _not_ this
> time).  So I’m getting e-mails that in part read:
>
> Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms.
> Disk eon34Ensd on nsd4 has a service time of 3146.715 ms.
>
> The “34” tells me what storage array and the “C” or “E” tells me what LUN
> on that storage array.  As I’ve mentioned, those two LUNs are by far and
> away my most frequent problem children, but here’s another report from
> today as well:
>
> Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms.
> Disk eon28Ansd on nsd7 has a service time of 1154.002 ms.
> Disk eon31Ansd on nsd3 has a service time of 1068.987 ms.
> Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms.
>
> NSD server hostnames have been changed, BTW, from their real names to nsd1
> - 8.
>
> Based on Fred’s excellent advice, we took a closer look at the “mmfsadm
> dump nsd” output.  We wrote a Python script to pull out what we think is
> the most pertinent information:
>
> nsd1
> 29 SMALL queues, 50 requests pending, 3741 was the highest number of
> requests pending.
>     348 threads started, 1 threads active, 348 was the highest number of
> threads active.
> 29 LARGE queues, 0 requests pending, 5694 was the highest number of
> requests pending.
>     348 threads started, 124 threads active, 348 was the highest number of
> threads active.
> nsd2
> 29 SMALL queues, 0 requests pending, 1246 was the highest number of
> requests pending.
>     348 threads started, 13 threads active, 348 was the highest number of
> threads active.
> 29 LARGE queues, 470 requests pending, 2404 was the highest number of
> requests pending.
>     348 threads started, 340 threads active, 348 was the highest number of
> threads active.
> nsd3
> 29 SMALL queues, 108 requests pending, 1796 was the highest number of
> requests pending.
>     348 threads started, 0 threads active, 348 was the highest number of
> threads active.
> 29 LARGE queues, 35 requests pending, 3331 was the highest number of
> requests pending.
>     348 threads started, 4 threads active, 348 was the highest number of
> threads active.
> nsd4
> 42 SMALL queues, 0 requests pending, 1529 was the highest number of
> requests pending.
>     504 threads started, 8 threads active, 504 was the highest number of
> threads active.
> 42 LARGE queues, 0 requests pending, 637 was the highest number of
> requests pending.
>     504 threads started, 211 threads active, 504 was the highest number of
> threads active.
> nsd5
> 42 SMALL queues, 182 requests pending, 2798 was the highest number of
> requests pending.
>     504 threads started, 6 threads active, 504 was the highest number of
> threads active.
> 42 LARGE queues, 407 requests pending, 4416 was the highest number of
> requests pending.
>     504 threads started, 8 threads active, 504 was the highest number of
> threads active.
> nsd6
> 42 SMALL queues, 0 requests pending, 1630 was the highest number of
> requests pending.
>     504 threads started, 0 threads active, 504 was the highest number of
> threads active.
> 42 LARGE queues, 0 requests pending, 148 was the highest number of
> requests pending.
>     504 threads started, 9 threads active, 504 was the highest number of
> threads active.
> nsd7
> 42 SMALL queues, 43 requests pending, 2179 was the highest number of
> requests pending.
>     504 threads started, 1 threads active, 504 was the highest number of
> threads active.
> 42 LARGE queues, 0 requests pending, 2551 was the highest number of
> requests pending.
>     504 threads started, 13 threads active, 504 was the highest number of
> threads active.
> nsd8
> 42 SMALL queues, 0 requests pending, 1014 was the highest number of
> requests pending.
>     504 threads started, 4 threads active, 504 was the highest number of
> threads active.
> 42 LARGE queues, 0 requests pending, 3371 was the highest number of
> requests pending.
>     504 threads started, 89 threads active, 504 was the highest number of
> threads active.
>
> Note that we see more “load” on the LARGE queue side of things and that
> nsd2 and nsd4 (the primary NSD servers for the 2 LUNs that show up most
> frequently in our alerts) are the heaviest loaded.
>
> One other thing we have noted is that our home grown RRDtool monitoring
> plots that are based on netstat, iostat, vmstat, etc. also show an oddity.
> Most of our LUNs show up as 33 - 68% utilized … but all the LUNs on eon34
> (there are 4 in total) show up as 93 - 97% utilized.  And another oddity
> there is that eon34A and eon34B rarely show up on the alert e-mails, while
> eon34C and eon34E show up waaaayyyyyyy more than anything else … the
> difference between them is that A and B are on the storage array itself and
> C and E are on JBOD’s SAS-attached to the storage array (and yes, we’ve
> actually checked and reseated those connections).
>
> Another reason why I could not respond earlier today is that one of the
> things which I did this afternoon was to upgrade the RAM on nsd2 and nsd4
> from 16 / 24 GB respectively to 64 GB each … and I then upped the pagepool
> on those two boxes to 40 GB.  That has not made a difference.  How can I
> determine how much of the pagepool is actually being used, BTW?  A quick
> Google search didn’t help me.
>
> So we’re trying to figure out if we have storage hardware issues causing
> GPFS issues or GPFS issues causing storage slowdowns.  The fact that I see
> slowdowns most often on one storage array points in one direction, while
> the fact that at times I see even worse slowdowns on multiple other arrays
> points the other way.  The fact that some NSD servers show better stats
> than others in the analysis of the “mmfsadm dump nsd” output tells me …
> well, I don’t know what it tells me.
>
> I think that’s all for now.  If you have read this entire very long
> e-mail, first off, thank you!  If you’ve read it and have ideas for where I
> should go from here, T-H-A-N-K Y-O-U!
>
> Kevin
>
> > On Jul 4, 2018, at 7:34 AM, Aaron Knister <aaron.s.knister at nasa.gov>
> wrote:
> >
> > Hi Kevin,
> >
> > Just going out on a very weird limb here...but you're not by chance
> seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g.
> SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high
> latency on some of our SFA12ks (that have otherwise been solid both in
> terms of stability and performance) but only on certain volumes and the
> affected volumes change. It's very bizzarre and we've been working closely
> with DDN to track down the root cause but we've not yet found a smoking
> gun. The timing and description of your problem sounded eerily similar to
> what we're seeing so I'd thought I'd ask.
> >
> > -Aaron
> >
> > --
> > Aaron Knister
> > NASA Center for Climate Simulation (Code 606.2)
> > Goddard Space Flight Center
> > (301) 286-2776
> >
> >
> > On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote:
> >
> >> Hi all,
> >> We are experiencing some high I/O wait times (5 - 20 seconds!) on some
> of our NSDs as reported by “mmdiag —iohist" and are struggling to
> understand why.  One of the
> >> confusing things is that, while certain NSDs tend to show the problem
> more than others, the problem is not consistent … i.e. the problem tends to
> move around from
> >> NSD to NSD (and storage array to storage array) whenever we check …
> which is sometimes just a few minutes apart.
> >> In the past when I have seen “mmdiag —iohist” report high wait times
> like this it has *always* been hardware related.  In our environment, the
> most common cause has
> >> been a battery backup unit on a storage array controller going bad and
> the storage array switching to write straight to disk.  But that’s *not*
> happening this time.
> >> Is there anything within GPFS / outside of a hardware issue that I
> should be looking for??  Thanks!
> >> —
> >> Kevin Buterbaugh - Senior System Administrator
> >> Vanderbilt University - Advanced Computing Center for Research and
> Education
> >> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> >
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D%2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> <https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C331014fd459d4151432308d5e340c4fa%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636664789687056826&sdata=q8sRHafEyyBx7BgcD9scnQVzb3egxSYiCKZbKScCbA0%3D&reserved=0>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
>
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C331014fd459d4151432308d5e340c4fa%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C1%7C636664789687076842&sdata=UhjNipQdsNjxIcUB%2Ffu2qEwn7K6tIBmGWEIruxGgI4A%3D&reserved=0
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180706/78342ef7/attachment.htm>