[gpfsug-discuss] Monitor NSD server queue?

Knister, Aaron S. (GSFC-606.2)[COMPUTER SCIENCE CORP] aaron.s.knister at nasa.gov
Wed Aug 17 02:46:39 BST 2016


Hi Everyone,

We ran into a rather interesting situation over the past week. We had a job that was pounding the ever loving crap out of one of our filesystems (called dnb02) doing about 15GB/s of reads. We had other jobs experience a slowdown on a different filesystem (called dnb41) that uses entirely separate backend storage. What I can't figure out is why this other filesystem was affected. I've checked IB bandwidth and congestion, Fibre channel bandwidth and errors, Ethernet bandwidth congestion, looked at the mmpmon nsd_ds counters (including disk request wait time), and checked out the disk iowait values from collectl. I simply can't account for the slowdown on the other filesystem. The only thing I can think of is the high latency on dnb02's NSDs caused the mmfsd NSD queues to back up.

Here's my question-- how can I monitor the state of th NSD queues? I can't find anything in mmdiag. An mmfsadm saferdump NSD shows me the queues and their status. I'm just not sure calling saferdump NSD every 10 seconds to monitor this data is going to end well. I've seen saferdump NSD cause mmfsd to die and that's from a task we only run every 6 hours that calls saferdump NSD.

Any thoughts/ideas here would be great.

Thanks!

-Aaron
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160817/6bc0cd80/attachment.htm>


More information about the gpfsug-discuss mailing list