[gpfsug-discuss] gpfs client expels

Thu Aug 21 14:04:59 BST 2014

Thanks for the info...  it helps a bit understanding whats going on, but 
i think you missed the part that Node A and Node B could also be the 
same machine.

If for instance i ran 2 cp on the same machine, hence Client B cannot 
have problems contacting Client A since they are the same machine.....

BTW i did the same also using 2 clients and the result its the same. 
Nonetheless your description is made me understand a bit better what's 
going on

Regards,
Salvatore

On 21/08/14 13:48, Bryan Banister wrote:
> As I understand GPFS distributed locking semantics, GPFS will not 
> allow one node to hold a write lock for a file indefinitely.  Once 
> Client B opens the file for writing it would have contacted the File 
> System Manager to obtain the lock.  The FS manager would have told 
> Client B that Client A has the lock and that Client B would have to 
> contact Client A and revoke the write lock token.  If Client A does 
> not respond to Client B's request to revoke the write token, then 
> Client B will ask that Client A be expelled from the cluster for NOT 
> adhering to the proper protocol for write lock contention.
>
>
>
> Have you checked the communication path between the two clients at 
> this point?
>
> I could not follow the logs that you provided.  You should definitely 
> look at the exact sequence of log events on the two clients and the 
> file system manager (as reported by mmlsmgr).
>
> Hope that helps,
> -Bryan
>
> ------------------------------------------------------------------------
> *From:* gpfsug-discuss-bounces at gpfsug.org 
> [gpfsug-discuss-bounces at gpfsug.org] on behalf of Salvatore Di Nardo 
> [sdinardo at ebi.ac.uk]
> *Sent:* Thursday, August 21, 2014 4:04 AM
> *To:* chair at gpfsug.org; gpfsug main discussion list
> *Subject:* Re: [gpfsug-discuss] gpfs client expels
>
> Thanks for the feedback, but we managed to find a scenario that 
> excludes network problems.
>
> we have a file called */input_file/* of nearly 100GB:
>
> if from *client A* we do:
>
> cat input_file >> output_file
>
> it start copying.. and we see waiter goeg a bit up,secs but then they 
> flushes back to 0, so we xcan say that the copy proceed well...
>
>
> if now we do the same from another client ( or just another shell on 
> the same client) *client B* :
>
> cat input_file >> output_file
>
>
>  ( in other words we are trying to write to the same destination) all 
> the waiters gets up until one node get expelled.
>
>
> Now, while its understandable that the destination file is locked for 
> one of the "cat", so have to wait ( and since the file is BIG , have 
> to wait for a while), its not understandable why it stop the renewal 
> lease.
> Why its doen't return just a timeout error on the copy instead to 
> expel the node? We can reproduce this every time, and since our users 
> to operations like this on files over 100GB each you can imagine the 
> result.
>
>
>
> As you can imagine even if its a bit silly to write at the same time 
> to the same destination, its also quite common if we want to dump to a 
> log file logs and for some reason one of the writers, write for a lot 
> of time keeping the file locked.
> Our expels are not due to network congestion, but because a write 
> attempts have to wait another one. What i really dont understand is 
> why to take a so expreme mesure to expell jest because a process is 
> waiteing "to too much time".
>
>
> I have ticket opened to IBM for this and the issue is under 
> investigation, but no luck so far..
>
> Regards,
> Salvatore
>
>
>
> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>> Hi there,
>>
>>   I've seen the on several 'stock'?  'core'? GPFS system (we need a 
>> better term now GSS is out) and seen ping 'working', but alongside 
>> ejections from the cluster.
>> The GPFS internode 'ping' is somewhat more circumspect than unix ping 
>> - and rightly so.
>>
>> In my experience this has _always_ been a network issue of one sort 
>> of another.  If the network is experiencing issues, nodes will be 
>> ejected.
>> Of course it could be unresponsive mmfsd or high loadavg, but I've 
>> seen that only twice in 10 years over many versions of GPFS.
>>
>> You need to follow the logs through from each machine in time order 
>> to determine who could not see who and in what order.
>> Your best way forward is to log a SEV2 case with IBM support, 
>> directly or via your OEM and collect and supply a snap and traces as 
>> required by support.
>>
>> Without knowing your full setup, it's hard to help further.
>>
>> Jez
>>
>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>> Still problems. Here some more detailed examples:
>>>
>>> *EXAMPLE 1:*
>>>
>>>             *EBI5-220**( CLIENT)**
>>>             *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
>>>             reply from node <GSS02B IP> gss02b*
>>>             Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A
>>>             IP> (gss02a in GSS.ebi.ac.uk) to expel <GSS02B IP>
>>>             (gss02b in GSS.ebi.ac.uk) from cluster GSS.ebi.ac.uk
>>>             Tue Aug 19 11:03:04.982 2014: This node will be expelled
>>>             from cluster GSS.ebi.ac.uk due to expel msg from
>>>             <EBI5-220 IP> (ebi5-220)
>>>             Tue Aug 19 11:03:09.319 2014: Cluster Manager connection
>>>             broke. Probing cluster GSS.ebi.ac.uk
>>>             Tue Aug 19 11:03:10.321 2014: Unable to contact any
>>>             quorum nodes during cluster probe.
>>>             Tue Aug 19 11:03:10.322 2014: Lost membership in cluster
>>>             GSS.ebi.ac.uk. Unmounting file systems.
>>>             Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount
>>>             invoked.  File system: gpfs1 Reason: SGPanic
>>>             Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
>>>             gss02a <c1p687>
>>>             Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
>>>             gss02a <c1p687>
>>>             Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
>>>             gss02b <c1p686>
>>>             Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
>>>             gss03b <c1p685>
>>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
>>>             gss03a <c1p684>
>>>             Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
>>>             gss01b <c1p683>
>>>             Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
>>>             gss01a <c1p1>
>>>             Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
>>>             gss02b <c1p686>
>>>             Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
>>>             gss03b <c1p685>
>>>             Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
>>>             gss03a <c1p684>
>>>             Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
>>>             gss01b <c1p683>
>>>             Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
>>>             gss01a <c1p1>
>>>             Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a
>>>             in GSS.ebi.ac.uk) is now the Group Leader.
>>>
>>>             *GSS02B ( NSD SERVER)*
>>>             ...
>>>             Tue Aug 19 11:03:17.070 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:25.016 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:28.080 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:36.019 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:39.083 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:47.023 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:50.088 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:52.218 2014: Killing connection from
>>>             <EBI5-043 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:03:58.030 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:01.092 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:03.220 2014: Killing connection from
>>>             <EBI5-043 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:09.034 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:12.096 2014: Killing connection from
>>>             *<EBI5-220 IP>* because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:14.224 2014: Killing connection from
>>>             <EBI5-043 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:20.037 2014: Killing connection from
>>>             <EBI5-102 IP> because the group is not ready for it to
>>>             rejoin, err 46
>>>             Tue Aug 19 11:04:23.103 2014: Accepted and connected to
>>>             *<EBI5-220 IP>* ebi5-220 <c0n618>
>>>             ...
>>>
>>>             *GSS02a ( NSD SERVER)*
>>>             Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP> (gss02b)
>>>             request from <EBI5-220 IP> (ebi5-220 in
>>>             ebi-cluster.ebi.ac.uk). Expelling: <EBI5-220 IP>
>>>             (ebi5-220 in ebi-cluster.ebi.ac.uk)
>>>             Tue Aug 19 11:03:12.069 2014: Accepted and connected to
>>>             <EBI5-220 IP> ebi5-220 <c0n618>
>>>
>>>
>>> ===============================================
>>> *EXAMPLE 2*:
>>>
>>>             *EBI5-038*
>>>             Tue Aug 19 11:32:34.227 2014: *Disk lease period expired
>>>             in cluster GSS.ebi.ac.uk. Attempting to reacquire lease.*
>>>             Tue Aug 19 11:33:34.258 2014: *Lease is overdue. Probing
>>>             cluster GSS.ebi.ac.uk*
>>>             Tue Aug 19 11:35:24.265 2014: Close connection to
>>>             <GSS02A IP> gss02a <c1n2> (Connection reset by peer).
>>>             Attempting reconnect.
>>>             Tue Aug 19 11:35:24.865 2014: Close connection to
>>>             <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by
>>>             peer). Attempting reconnect.
>>>             ...
>>>             LOT MORE RESETS BY PEER
>>>             ...
>>>             Tue Aug 19 11:35:25.096 2014: Close connection to
>>>             <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by
>>>             peer). Attempting reconnect.
>>>             Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
>>>             gss02a <c1n2>
>>>             Tue Aug 19 11:35:25.268 2014: Close connection to
>>>             <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>             destination is still processing previous node failure)
>>>             Tue Aug 19 11:35:26.267 2014: Retry connection to
>>>             <GSS02A IP> gss02a <c1n2>
>>>             Tue Aug 19 11:35:26.268 2014: Close connection to
>>>             <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>             destination is still processing previous node failure)
>>>             Tue Aug 19 11:36:24.276 2014: Unable to contact any
>>>             quorum nodes during cluster probe.
>>>             Tue Aug 19 11:36:24.277 2014: *Lost membership in
>>>             cluster GSS.ebi.ac.uk. Unmounting file systems.*
>>>
>>>             *GSS02a*
>>>             Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP>
>>>             (ebi5-038 in ebi-cluster.ebi.ac.uk) *is being expelled
>>>             because of an expired lease.* Pings sent: 60. Replies
>>>             received: 60.
>>>
>>>
>>>
>>>
>>> In example 1 seems that an NSD was not repliyng to the client, but 
>>> the servers seems working fine.. how can i trace better ( to solve) 
>>> the problem?
>>>
>>> In example 2 it seems to me that for some reason the manager are not 
>>> renewing the lease in time. when this happens , its not a single 
>>> client.
>>> Loads of them fail to get the lease renewed. Why this is happening? 
>>> how can i trace to the source of the problem?
>>>
>>>
>>>
>>> Thanks in advance for any tips.
>>>
>>> Regards,
>>> Salvatore
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at gpfsug.org
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>>
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
> ------------------------------------------------------------------------
>
> Note: This email is for the confidential use of the named addressee(s) 
> only and may contain proprietary, confidential or privileged 
> information. If you are not the intended recipient, you are hereby 
> notified that any review, dissemination or copying of this email is 
> strictly prohibited, and to please notify the sender immediately and 
> destroy this email and any attachments. Email transmission cannot be 
> guaranteed to be secure or error-free. The Company, therefore, does 
> not make any guarantees as to the completeness or accuracy of this 
> email or any attachments. This email is for informational purposes 
> only and does not constitute a recommendation, offer, request or 
> solicitation of any kind to buy, sell, subscribe, redeem or perform 
> any type of transaction of a financial product.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/c4ca002e/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/png
Size: 249179 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/c4ca002e/attachment.png>