[gpfsug-discuss] High I/O wait times

Wed Jul 4 03:43:28 BST 2018

You dont state whether your running GPFS or ESS and which level.
One thing you can check, is whether the SES and enclosure drivers are being
loaded.
The lsmod command will show if they are.
These drivers were found to cause SCSI IO hangs in Linux RH7.3 and 7.4.
If they are being loaded, you can blacklist and unload them with no impact
to ESS/GNR
By default these drivers are blacklisted in ESS.

 Stephen Tee
 ESS Storage Development
 IBM Systems and Technology
 Austin, TX
 512-963-7177

From:	Steve Crusan <scrusan at ddn.com>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	07/03/2018 05:08 PM
Subject:	Re: [gpfsug-discuss] High I/O wait times
Sent by:	gpfsug-discuss-bounces at spectrumscale.org

Kevin,

    While this is happening, are you able to grab latency stats per LUN
(hardware vendor agnostic) to see if there are any outliers? Also, when
looking at the mmdiag output, are both reads and writes affected? Depending
on the storage hardware, your writes might be hitting cache, so maybe this
problem is being exasperated by many small reads (that are too random to be
coalesced, take advantage of drive NCQ, etc).

    The other response about the nsd threads is also a good start, but if
the I/O waits shift between different NSD servers and across hardware
vendors, my assumption would be that you are hitting a bottleneck
somewhere, but what you are seeing is symptoms of I/O backlog, which can
manifest at any number of places. This could be something as low level as a
few slow drives.

    Have you just started noticing this behavior? Any new applications on
your system? Going by your institution, you're probably supposing a wide
variety of codes, so if these problems just started happening, its possible
that someone changed their code, or decided to run new scientific packages.

-Steve
________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org
[gpfsug-discuss-bounces at spectrumscale.org] on behalf of Buterbaugh, Kevin L
[Kevin.Buterbaugh at Vanderbilt.Edu]
Sent: Tuesday, July 03, 2018 11:43 AM
To: gpfsug main discussion list
Subject: [gpfsug-discuss] High I/O wait times

Hi all,
not
We are experiencing some high I/O wait times (5 - 20 seconds!) on some of
our NSDs as reported by “mmdiag —iohist" and are struggling to understand
why.  One of the confusing things is that, while certain NSDs tend to show
the problem more than others, the problem is consistent … i.e. the problem
tends to move around from NSD to NSD (and storage array to storage array)
whenever we check … which is sometimes just a few minutes apart.

In the past when I have seen “mmdiag —iohist” report high wait times like
this it has *always* been hardware related.  In our environment, the most
common cause has been a battery backup unit on a storage array controller
going bad and the storage array switching to write straight to disk.  But
that’s *not* happening this time.

Is there anything within GPFS / outside of a hardware issue that I should
be looking for??  Thanks!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and
Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> -
(615)875-9633

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180703/33de2402/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180703/33de2402/attachment.gif>