[gpfsug-discuss] iowait?

Aaron Knister aaron.s.knister at nasa.gov
Mon Aug 29 18:54:12 BST 2016


Sure, we can and we do use both iostat/sar and collectl to collect disk 
utilization on our nsd servers. That doesn't give us insight, though, 
into any individual client node of which we've got 3500. We do log 
mmpmon data from each node but that doesn't give us any insight into how 
much time is being spent waiting on I/O. Having GPFS report iowait on 
client nodes would give us this insight.

On 8/29/16 1:50 PM, Alex Chekholko wrote:
> Any reason you can't just use iostat or collectl or any of a number of
> other standards tools to look at disk utilization?
>
> On 08/29/2016 10:33 AM, Aaron Knister wrote:
>> Hi Everyone,
>>
>> Would it be easy to have GPFS report iowait values in linux? This would
>> be a huge help for us in determining whether a node's low utilization is
>> due to some issue with the code running on it or if it's blocked on I/O,
>> especially in a historical context.
>>
>> I naively tried on a test system changing schedule() in
>> cxiWaitEventWait() on line ~2832 in gpl-linux/cxiSystem.c to this:
>>
>> again:
>>   /* call the scheduler */
>>   if ( waitFlags & INTERRUPTIBLE )
>>     schedule();
>>   else
>>     io_schedule();
>>
>> Seems to actually do what I'm after but generally bad things happen when
>> I start pretending I'm a kernel developer.
>>
>> Any thoughts? If I open an RFE would this be something that's relatively
>> easy to implement (not asking for a commitment *to* implement it, just
>> that I'm not asking for something seemingly simple that's actually
>> fairly hard to implement)?
>>
>> -Aaron
>>
>

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776



More information about the gpfsug-discuss mailing list