[gpfsug-discuss] High I/O wait times

Fri Jul 6 02:11:17 BST 2018

Hi All,

First off, my apologies for the delay in responding back to the list … we’ve actually been working our tails off on this one trying to collect as much data as we can on what is a very weird issue.  While I’m responding to Aaron’s e-mail, I’m going to try to address the questions raised in all the responses.

Steve - this all started last week.  You’re correct about our mixed workload.  There have been no new workloads that I am aware of.

Stephen - no, this is not an ESS.  We are running GPFS 4.2.3-8.

Aaron - no, this is not on a DDN, either.

The hardware setup is a vanilla 8 GB FC SAN.  Commodity hardware for the servers and storage.  We have two SAN “stacks” and all NSD servers and storage are connected to both stacks.  Linux multipathing handles path failures.  10 GbE out to the network.

We first were alerted to this problem by one of our monitoring scripts which was designed to alert us to abnormally high I/O times, which, as I mentioned previously, in our environment has usually been caused by cache battery backup failures in the storage array controllers (but _not_ this time).  So I’m getting e-mails that in part read:

Disk eon34Cnsd on nsd2 has a service time of 4625.083 ms.
Disk eon34Ensd on nsd4 has a service time of 3146.715 ms.

The “34” tells me what storage array and the “C” or “E” tells me what LUN on that storage array.  As I’ve mentioned, those two LUNs are by far and away my most frequent problem children, but here’s another report from today as well:

Disk eon28Bnsd on nsd8 has a service time of 1119.385 ms.
Disk eon28Ansd on nsd7 has a service time of 1154.002 ms.
Disk eon31Ansd on nsd3 has a service time of 1068.987 ms.
Disk eon34Cnsd on nsd2 has a service time of 4991.365 ms.

NSD server hostnames have been changed, BTW, from their real names to nsd1 - 8.

Based on Fred’s excellent advice, we took a closer look at the “mmfsadm dump nsd” output.  We wrote a Python script to pull out what we think is the most pertinent information:

nsd1
29 SMALL queues, 50 requests pending, 3741 was the highest number of requests pending.
	348 threads started, 1 threads active, 348 was the highest number of threads active.
29 LARGE queues, 0 requests pending, 5694 was the highest number of requests pending.
	348 threads started, 124 threads active, 348 was the highest number of threads active.
nsd2
29 SMALL queues, 0 requests pending, 1246 was the highest number of requests pending.
	348 threads started, 13 threads active, 348 was the highest number of threads active.
29 LARGE queues, 470 requests pending, 2404 was the highest number of requests pending.
	348 threads started, 340 threads active, 348 was the highest number of threads active.
nsd3
29 SMALL queues, 108 requests pending, 1796 was the highest number of requests pending.
	348 threads started, 0 threads active, 348 was the highest number of threads active.
29 LARGE queues, 35 requests pending, 3331 was the highest number of requests pending.
	348 threads started, 4 threads active, 348 was the highest number of threads active.
nsd4
42 SMALL queues, 0 requests pending, 1529 was the highest number of requests pending.
	504 threads started, 8 threads active, 504 was the highest number of threads active.
42 LARGE queues, 0 requests pending, 637 was the highest number of requests pending.
	504 threads started, 211 threads active, 504 was the highest number of threads active.
nsd5
42 SMALL queues, 182 requests pending, 2798 was the highest number of requests pending.
	504 threads started, 6 threads active, 504 was the highest number of threads active.
42 LARGE queues, 407 requests pending, 4416 was the highest number of requests pending.
	504 threads started, 8 threads active, 504 was the highest number of threads active.
nsd6
42 SMALL queues, 0 requests pending, 1630 was the highest number of requests pending.
	504 threads started, 0 threads active, 504 was the highest number of threads active.
42 LARGE queues, 0 requests pending, 148 was the highest number of requests pending.
	504 threads started, 9 threads active, 504 was the highest number of threads active.
nsd7
42 SMALL queues, 43 requests pending, 2179 was the highest number of requests pending.
	504 threads started, 1 threads active, 504 was the highest number of threads active.
42 LARGE queues, 0 requests pending, 2551 was the highest number of requests pending.
	504 threads started, 13 threads active, 504 was the highest number of threads active.
nsd8
42 SMALL queues, 0 requests pending, 1014 was the highest number of requests pending.
	504 threads started, 4 threads active, 504 was the highest number of threads active.
42 LARGE queues, 0 requests pending, 3371 was the highest number of requests pending.
	504 threads started, 89 threads active, 504 was the highest number of threads active.

Note that we see more “load” on the LARGE queue side of things and that nsd2 and nsd4 (the primary NSD servers for the 2 LUNs that show up most frequently in our alerts) are the heaviest loaded.

One other thing we have noted is that our home grown RRDtool monitoring plots that are based on netstat, iostat, vmstat, etc. also show an oddity.  Most of our LUNs show up as 33 - 68% utilized … but all the LUNs on eon34 (there are 4 in total) show up as 93 - 97% utilized.  And another oddity there is that eon34A and eon34B rarely show up on the alert e-mails, while eon34C and eon34E show up waaaayyyyyyy more than anything else … the difference between them is that A and B are on the storage array itself and C and E are on JBOD’s SAS-attached to the storage array (and yes, we’ve actually checked and reseated those connections).

Another reason why I could not respond earlier today is that one of the things which I did this afternoon was to upgrade the RAM on nsd2 and nsd4 from 16 / 24 GB respectively to 64 GB each … and I then upped the pagepool on those two boxes to 40 GB.  That has not made a difference.  How can I determine how much of the pagepool is actually being used, BTW?  A quick Google search didn’t help me.

So we’re trying to figure out if we have storage hardware issues causing GPFS issues or GPFS issues causing storage slowdowns.  The fact that I see slowdowns most often on one storage array points in one direction, while the fact that at times I see even worse slowdowns on multiple other arrays points the other way.  The fact that some NSD servers show better stats than others in the analysis of the “mmfsadm dump nsd” output tells me … well, I don’t know what it tells me.

I think that’s all for now.  If you have read this entire very long e-mail, first off, thank you!  If you’ve read it and have ideas for where I should go from here, T-H-A-N-K Y-O-U!

Kevin

> On Jul 4, 2018, at 7:34 AM, Aaron Knister <aaron.s.knister at nasa.gov> wrote:
> 
> Hi Kevin,
> 
> Just going out on a very weird limb here...but you're not by chance seeing this behavior on DDN hardware that runs the SFA OS are you? (e.g. SFA12K, 7K, 14K, etc.) We just started seeing some very weird and high latency on some of our SFA12ks (that have otherwise been solid both in terms of stability and performance) but only on certain volumes and the affected volumes change. It's very bizzarre and we've been working closely with DDN to track down the root cause but we've not yet found a smoking gun. The timing and description of your problem sounded eerily similar to what we're seeing so I'd thought I'd ask.
> 
> -Aaron
> 
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> 
> 
> On Tue, 3 Jul 2018, Buterbaugh, Kevin L wrote:
> 
>> Hi all,
>> We are experiencing some high I/O wait times (5 - 20 seconds!) on some of our NSDs as reported by “mmdiag —iohist" and are struggling to understand why.  One of the
>> confusing things is that, while certain NSDs tend to show the problem more than others, the problem is not consistent … i.e. the problem tends to move around from
>> NSD to NSD (and storage array to storage array) whenever we check … which is sometimes just a few minutes apart.
>> In the past when I have seen “mmdiag —iohist” report high wait times like this it has *always* been hardware related.  In our environment, the most common cause has
>> been a battery backup unit on a storage array controller going bad and the storage array switching to write straight to disk.  But that’s *not* happening this time.
>> Is there anything within GPFS / outside of a hardware issue that I should be looking for??  Thanks!
>> —
>> Kevin Buterbaugh - Senior System Administrator
>> Vanderbilt University - Advanced Computing Center for Research and Education
>> Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fgpfsug.org%2Fmailman%2Flistinfo%2Fgpfsug-discuss&data=02%7C01%7CKevin.Buterbaugh%40vanderbilt.edu%7C9c1c75becd20479479a608d5e1ab43ec%7Cba5a7f39e3be4ab3b45067fa80faecad%7C0%7C0%7C636663048058564742&sdata=if1uC53Y7K3D%2FMuVMskzsYqPx9qftU1ICQfP23c7bI0%3D&reserved=0