[gpfsug-discuss] Waiter identification help - Quota related

Aaron Knister aaron.s.knister at nasa.gov
Fri Jan 27 01:26:49 GMT 2017


This might be a stretch but do you happen to have a user/fileset/group 
over it's hard quota or soft quota + grace period? We've had this really 
upset our cluster before. At least with 3.5 each op that's done against 
an over quota user/group/fileset results in at least one rpc from the fs 
manager to every node in the cluster.

Are those waiters from an fs manager node? If so perhaps briefly fire up 
tracing (/usr/lpp/mmfs/bin/mmtrace start) let it run for ~10 seconds 
then stop it (/usr/lpp/mmfs/bin/mmtrace stop) then grep for 
"TRACE_QUOTA" out of the resulting trcrpt file. If you see a bunch of 
lines that contain:

TRACE_QUOTA: qu.server revoke reply type

that might be what's going on. You can also see the behavior if you look 
at the output of mmdiag --network on your fs manager nodes and see a 
bunch of RPC's with all of your cluster node listed as the recipients. 
Can't recall what the RPC is called that you're looking for, though.

Hope that helps!

-Aaron

On 1/26/17 7:57 PM, Oesterlin, Robert wrote:
> OK, I have a sick cluster, and it seems to be tied up with quota related
> RPCs like this. Any help in narrowing down what the issue is?
>
>
>
> Waiting 3.8729 sec since 19:54:09, monitored, thread 32786 Msg handler
> quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 4.3158 sec since 19:54:08, monitored, thread 32771 Msg handler
> quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 4.3173 sec since 19:54:08, monitored, thread 35829 Msg handler
> quotaMsgPrefetchShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 4.4619 sec since 19:54:08, monitored, thread 9694 Msg handler
> quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 4.4967 sec since 19:54:08, monitored, thread 32357 Msg handler
> quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 4.6885 sec since 19:54:08, monitored, thread 32305 Msg handler
> quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 4.7123 sec since 19:54:08, monitored, thread 32261 Msg handler
> quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 4.7932 sec since 19:54:08, monitored, thread 53409 Msg handler
> quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 5.2954 sec since 19:54:07, monitored, thread 32905 Msg handler
> quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 5.3058 sec since 19:54:07, monitored, thread 32573 Msg handler
> quotaMsgPrefetchShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 5.3207 sec since 19:54:07, monitored, thread 32397 Msg handler
> quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 5.3274 sec since 19:54:07, monitored, thread 32897 Msg handler
> quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 5.3343 sec since 19:54:07, monitored, thread 32691 Msg handler
> quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 5.3347 sec since 19:54:07, monitored, thread 32364 Msg handler
> quotaMsgRequestShare: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
> Waiting 5.3348 sec since 19:54:07, monitored, thread 32522 Msg handler
> quotaMsgRelinquish: on ThCond 0x1801919D3C8 (LkObjCondvar), reason
> 'waiting for WA lock'
>
>
>
> Bob Oesterlin
> Sr Principal Storage Engineer, Nuance
> 507-269-0413
>
>
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776



More information about the gpfsug-discuss mailing list