[gpfsug-discuss] gpfs client expels
Salvatore Di Nardo
sdinardo at ebi.ac.uk
Thu Aug 21 14:04:59 BST 2014
Thanks for the info... it helps a bit understanding whats going on, but
i think you missed the part that Node A and Node B could also be the
same machine.
If for instance i ran 2 cp on the same machine, hence Client B cannot
have problems contacting Client A since they are the same machine.....
BTW i did the same also using 2 clients and the result its the same.
Nonetheless your description is made me understand a bit better what's
going on
Regards,
Salvatore
On 21/08/14 13:48, Bryan Banister wrote:
> As I understand GPFS distributed locking semantics, GPFS will not
> allow one node to hold a write lock for a file indefinitely. Once
> Client B opens the file for writing it would have contacted the File
> System Manager to obtain the lock. The FS manager would have told
> Client B that Client A has the lock and that Client B would have to
> contact Client A and revoke the write lock token. If Client A does
> not respond to Client B's request to revoke the write token, then
> Client B will ask that Client A be expelled from the cluster for NOT
> adhering to the proper protocol for write lock contention.
>
>
>
> Have you checked the communication path between the two clients at
> this point?
>
> I could not follow the logs that you provided. You should definitely
> look at the exact sequence of log events on the two clients and the
> file system manager (as reported by mmlsmgr).
>
> Hope that helps,
> -Bryan
>
> ------------------------------------------------------------------------
> *From:* gpfsug-discuss-bounces at gpfsug.org
> [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo
> [sdinardo at ebi.ac.uk]
> *Sent:* Thursday, August 21, 2014 4:04 AM
> *To:* chair at gpfsug.org; gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] gpfs client expels
>
> Thanks for the feedback, but we managed to find a scenario that
> excludes network problems.
>
> we have a file called */input_file/* of nearly 100GB:
>
> if from *client A* we do:
>
> cat input_file >> output_file
>
> it start copying.. and we see waiter goeg a bit up,secs but then they
> flushes back to 0, so we xcan say that the copy proceed well...
>
>
> if now we do the same from another client ( or just another shell on
> the same client) *client B* :
>
> cat input_file >> output_file
>
>
> ( in other words we are trying to write to the same destination) all
> the waiters gets up until one node get expelled.
>
>
> Now, while its understandable that the destination file is locked for
> one of the "cat", so have to wait ( and since the file is BIG , have
> to wait for a while), its not understandable why it stop the renewal
> lease.
> Why its doen't return just a timeout error on the copy instead to
> expel the node? We can reproduce this every time, and since our users
> to operations like this on files over 100GB each you can imagine the
> result.
>
>
>
> As you can imagine even if its a bit silly to write at the same time
> to the same destination, its also quite common if we want to dump to a
> log file logs and for some reason one of the writers, write for a lot
> of time keeping the file locked.
> Our expels are not due to network congestion, but because a write
> attempts have to wait another one. What i really dont understand is
> why to take a so expreme mesure to expell jest because a process is
> waiteing "to too much time".
>
>
> I have ticket opened to IBM for this and the issue is under
> investigation, but no luck so far..
>
> Regards,
> Salvatore
>
>
>
> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>> Hi there,
>>
>> I've seen the on several 'stock'? 'core'? GPFS system (we need a
>> better term now GSS is out) and seen ping 'working', but alongside
>> ejections from the cluster.
>> The GPFS internode 'ping' is somewhat more circumspect than unix ping
>> - and rightly so.
>>
>> In my experience this has _always_ been a network issue of one sort
>> of another. If the network is experiencing issues, nodes will be
>> ejected.
>> Of course it could be unresponsive mmfsd or high loadavg, but I've
>> seen that only twice in 10 years over many versions of GPFS.
>>
>> You need to follow the logs through from each machine in time order
>> to determine who could not see who and in what order.
>> Your best way forward is to log a SEV2 case with IBM support,
>> directly or via your OEM and collect and supply a snap and traces as
>> required by support.
>>
>> Without knowing your full setup, it's hard to help further.
>>
>> Jez
>>
>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>> Still problems. Here some more detailed examples:
>>>
>>> *EXAMPLE 1:*
>>>
>>> *EBI5-220**( CLIENT)**
>>> *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
>>> reply from node <GSS02B IP> gss02b*
>>> Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A
>>> IP> (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP>
>>> (gss02b in GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>>> Tue Aug 19 11:03:04.982 2014: This node will be expelled
>>> from cluster GSS.ebi.ac.uk due to expel msg from
>>> <EBI5-220 IP> (ebi5-220)
>>> Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
>>> broke. Probing cluster GSS.ebi.ac.uk
>>> Tue Aug 19 11:03:10.321 2014: Unable to contact any
>>> quorum nodes during cluster probe.
>>> Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
>>> GSS.ebi.ac.uk. Unmounting file systems.
>>> Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount
>>> invoked. File system: gpfs1 Reason: SGPanic
>>> Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
>>> gss02a <c1p687>
>>> Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
>>> gss02a <c1p687>
>>> Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
>>> gss02b <c1p686>
>>> Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
>>> gss03b <c1p685>
>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
>>> gss03a <c1p684>
>>> Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
>>> gss01b <c1p683>
>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
>>> gss01a <c1p1>
>>> Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
>>> gss02b <c1p686>
>>> Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
>>> gss03b <c1p685>
>>> Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
>>> gss03a <c1p684>
>>> Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
>>> gss01b <c1p683>
>>> Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
>>> gss01a <c1p1>
>>> Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a
>>> in GSS.ebi.ac.uk) is now the Group Leader.
>>>
>>> *GSS02B ( NSD SERVER)*
>>> ...
>>> Tue Aug 19 11:03:17.070 2014: Killing connection from
>>> *<EBI5-220 IP>* because the group is not ready for it to
>>> rejoin, err 46
>>> Tue Aug 19 11:03:25.016 2014: Killing connection from
>>> <EBI5-102 IP> because the group is not ready for it to
>>> rejoin, err 46
>>> Tue Aug 19 11:03:28.080 2014: Killing connection from
>>> *<EBI5-220 IP>* because the group is not ready for it to
>>> rejoin, err 46
>>> Tue Aug 19 11:03:36.019 2014: Killing connection from
>>> <EBI5-102 IP> because the group is not ready for it to
>>> rejoin, err 46
>>> Tue Aug 19 11:03:39.083 2014: Killing connection from
>>> *<EBI5-220 IP>* because the group is not ready for it to
>>> rejoin, err 46
>>> Tue Aug 19 11:03:47.023 2014: Killing connection from
>>> <EBI5-102 IP> because the group is not ready for it to
>>> rejoin, err 46
>>> Tue Aug 19 11:03:50.088 2014: Killing connection from
>>> *<EBI5-220 IP>* because the group is not ready for it to
>>> rejoin, err 46
>>> Tue Aug 19 11:03:52.218 2014: Killing connection from
>>> <EBI5-043 IP> because the group is not ready for it to
>>> rejoin, err 46
>>> Tue Aug 19 11:03:58.030 2014: Killing connection from
>>> <EBI5-102 IP> because the group is not ready for it to
>>> rejoin, err 46
>>> Tue Aug 19 11:04:01.092 2014: Killing connection from
>>> *<EBI5-220 IP>* because the group is not ready for it to
>>> rejoin, err 46
>>> Tue Aug 19 11:04:03.220 2014: Killing connection from
>>> <EBI5-043 IP> because the group is not ready for it to
>>> rejoin, err 46
>>> Tue Aug 19 11:04:09.034 2014: Killing connection from
>>> <EBI5-102 IP> because the group is not ready for it to
>>> rejoin, err 46
>>> Tue Aug 19 11:04:12.096 2014: Killing connection from
>>> *<EBI5-220 IP>* because the group is not ready for it to
>>> rejoin, err 46
>>> Tue Aug 19 11:04:14.224 2014: Killing connection from
>>> <EBI5-043 IP> because the group is not ready for it to
>>> rejoin, err 46
>>> Tue Aug 19 11:04:20.037 2014: Killing connection from
>>> <EBI5-102 IP> because the group is not ready for it to
>>> rejoin, err 46
>>> Tue Aug 19 11:04:23.103 2014: Accepted and connected to
>>> *<EBI5-220 IP>* ebi5-220 <c0n618>
>>> ...
>>>
>>> *GSS02a ( NSD SERVER)*
>>> Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
>>> request from <EBI5-220 IP> (ebi5-220 in
>>> ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP>
>>> (ebi5-220 in ebi-cluster.ebi.ac.uk)
>>> Tue Aug 19 11:03:12.069 2014: Accepted and connected to
>>> <EBI5-220 IP> ebi5-220 <c0n618>
>>>
>>>
>>> ===============================================
>>> *EXAMPLE 2*:
>>>
>>> *EBI5-038*
>>> Tue Aug 19 11:32:34.227 2014: *Disk lease period expired
>>> in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
>>> Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
>>> cluster GSS.ebi.ac.uk*
>>> Tue Aug 19 11:35:24.265 2014: Close connection to
>>> <GSS02A IP> gss02a <c1n2> (Connection reset by peer).
>>> Attempting reconnect.
>>> Tue Aug 19 11:35:24.865 2014: Close connection to
>>> <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by
>>> peer). Attempting reconnect.
>>> ...
>>> LOT MORE RESETS BY PEER
>>> ...
>>> Tue Aug 19 11:35:25.096 2014: Close connection to
>>> <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by
>>> peer). Attempting reconnect.
>>> Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
>>> gss02a <c1n2>
>>> Tue Aug 19 11:35:25.268 2014: Close connection to
>>> <GSS02A IP> gss02a <c1n2> (Connection failed because
>>> destination is still processing previous node failure)
>>> Tue Aug 19 11:35:26.267 2014: Retry connection to
>>> <GSS02A IP> gss02a <c1n2>
>>> Tue Aug 19 11:35:26.268 2014: Close connection to
>>> <GSS02A IP> gss02a <c1n2> (Connection failed because
>>> destination is still processing previous node failure)
>>> Tue Aug 19 11:36:24.276 2014: Unable to contact any
>>> quorum nodes during cluster probe.
>>> Tue Aug 19 11:36:24.277 2014: *Lost membership in
>>> cluster GSS.ebi.ac.uk. Unmounting file systems.*
>>>
>>> *GSS02a*
>>> Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP>
>>> (ebi5-038 in ebi-cluster.ebi.ac.uk) *is being expelled
>>> because of an expired lease.* Pings sent: 60. Replies
>>> received: 60.
>>>
>>>
>>>
>>>
>>> In example 1 seems that an NSD was not repliyng to the client, but
>>> the servers seems working fine.. how can i trace better ( to solve)
>>> the problem?
>>>
>>> In example 2 it seems to me that for some reason the manager are not
>>> renewing the lease in time. when this happens , its not a single
>>> client.
>>> Loads of them fail to get the lease renewed. Why this is happening?
>>> how can i trace to the source of the problem?
>>>
>>>
>>>
>>> Thanks in advance for any tips.
>>>
>>> Regards,
>>> Salvatore
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> ------------------------------------------------------------------------
>
> Note: This email is for the confidential use of the named addressee(s)
> only and may contain proprietary, confidential or privileged
> information. If you are not the intended recipient, you are hereby
> notified that any review, dissemination or copying of this email is
> strictly prohibited, and to please notify the sender immediately and
> destroy this email and any attachments. Email transmission cannot be
> guaranteed to be secure or error-free. The Company, therefore, does
> not make any guarantees as to the completeness or accuracy of this
> email or any attachments. This email is for informational purposes
> only and does not constitute a recommendation, offer, request or
> solicitation of any kind to buy, sell, subscribe, redeem or perform
> any type of transaction of a financial product.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/c4ca002e/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 249179 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/c4ca002e/attachment.png>
More information about the gpfsug-discuss
mailing list