[gpfsug-discuss] Executing Callbacks on other Nodes
Marc A Kaplan
makaplan at us.ibm.com
Tue Apr 12 23:01:40 BST 2016
My understanding is (someone will correct me if I'm wrong) ...
GPFS does not have true deadlock detection. As you say it has time outs.
The argument is: As a practical matter, it makes not much difference to a
sysadmin or user -- if things are gummed up "too long" they start to smell
like a deadlock, so we may as well intervene as though there were a true
technical deadlock.
A genuine true deadlock is a situation where things are gummed up, there
is no progress, and one can prove that there will be no progress, no
matter how long one waits.
E.g. Classically, you have locked resource A and I have locked resource B
and now I decide I need resource A and I am waiting indefinitely long for
that. And you have decided you need resouce B and you are waiting
indefinitely for that. We are then deadlocked. Deadlock can occur on a
single node or over multiple nodes.
Technically it may be possible to execute a deadlock detection protocol
that would identify cyclic, deadlocking dependencies, but it was decided
that, for GPFS, it would be more practical to detect "very long
waiters"...
From: "Oesterlin, Robert" <Robert.Oesterlin at nuance.com>
Some general thoughts on “deadlocks” and automated deadlock detection.
I personally don’t like the term “deadlock” as it implies a condition that
won’t ever resolve itself. In GPFS terms, a deadlock is really a “long RPC
waiter” over a certain threshold. RPCs that wait on certain events can and
do occur and they can take some time to complete. This is not necessarily
a condition that is a problem, but you should be looking into them.
GPFS does have automated deadlock detection and collection, but in the
early releases it was … well.. it’s not very “robust”. With later releases
(4.2) it’s MUCH better. I personally don’t rely on it because in larger
clusters it can be too aggressive and depending on what’s really going on
it can make things worse. This statement is my opinion and it doesn’t mean
it’s not a good thing to have. :-)
...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160412/4865ce10/attachment.htm>
More information about the gpfsug-discuss
mailing list