[gpfsug-discuss] gpfs client expels
Salvatore Di Nardo
sdinardo at ebi.ac.uk
Thu Aug 21 14:18:19 BST 2014
This is an interesting point!
We use ethernet ( 10g links on the clients) but we dont have a separate
network for the admin network.
Could you explain this a bit further, because the clients and the
servers we have are on different subnet so the packet are routed.. I
don't see a practical way to separate them. The clients are blades in a
chassis so even if i create 2 interfaces, they will physically use the
came "cable" to go to the first switch. even the clients ( 600 clients)
have different subsets.
I will forward this consideration to our network admin , so see if we
can work on a dedicated network.
thanks for your tip.
Regards,
Salvatore
On 21/08/14 14:03, Vic Cornell wrote:
> Hi Salvatore,
>
> Are you using ethernet or infiniband as the GPFS interconnect to your
> clients?
>
> If 10/40GbE - do you have a separate admin network?
>
> I have seen behaviour similar to this where the storage traffic causes
> congestion and the "admin" traffic gets lost or delayed causing expels.
>
> Vic
>
>
>
> On 21 Aug 2014, at 10:04, Salvatore Di Nardo <sdinardo at ebi.ac.uk
> <mailto:sdinardo at ebi.ac.uk>> wrote:
>
>> Thanks for the feedback, but we managed to find a scenario that
>> excludes network problems.
>>
>> we have a file called */input_file/* of nearly 100GB:
>>
>> if from *client A* we do:
>>
>> cat input_file >> output_file
>>
>> it start copying.. and we see waiter goeg a bit up,secs but then they
>> flushes back to 0, so we xcan say that the copy proceed well...
>>
>>
>> if now we do the same from another client ( or just another shell on
>> the same client) *client B* :
>>
>> cat input_file >> output_file
>>
>>
>> ( in other words we are trying to write to the same destination) all
>> the waiters gets up until one node get expelled.
>>
>>
>> Now, while its understandable that the destination file is locked for
>> one of the "cat", so have to wait ( and since the file is BIG , have
>> to wait for a while), its not understandable why it stop the renewal
>> lease.
>> Why its doen't return just a timeout error on the copy instead to
>> expel the node? We can reproduce this every time, and since our users
>> to operations like this on files over 100GB each you can imagine the
>> result.
>>
>>
>>
>> As you can imagine even if its a bit silly to write at the same time
>> to the same destination, its also quite common if we want to dump to
>> a log file logs and for some reason one of the writers, write for a
>> lot of time keeping the file locked.
>> Our expels are not due to network congestion, but because a write
>> attempts have to wait another one. What i really dont understand is
>> why to take a so expreme mesure to expell jest because a process is
>> waiteing "to too much time".
>>
>>
>> I have ticket opened to IBM for this and the issue is under
>> investigation, but no luck so far..
>>
>> Regards,
>> Salvatore
>>
>>
>>
>> On 21/08/14 09:20, Jez Tucker (Chair) wrote:
>>> Hi there,
>>>
>>> I've seen the on several 'stock'? 'core'? GPFS system (we need a
>>> better term now GSS is out) and seen ping 'working', but alongside
>>> ejections from the cluster.
>>> The GPFS internode 'ping' is somewhat more circumspect than unix
>>> ping - and rightly so.
>>>
>>> In my experience this has _always_ been a network issue of one sort
>>> of another. If the network is experiencing issues, nodes will be
>>> ejected.
>>> Of course it could be unresponsive mmfsd or high loadavg, but I've
>>> seen that only twice in 10 years over many versions of GPFS.
>>>
>>> You need to follow the logs through from each machine in time order
>>> to determine who could not see who and in what order.
>>> Your best way forward is to log a SEV2 case with IBM support,
>>> directly or via your OEM and collect and supply a snap and traces as
>>> required by support.
>>>
>>> Without knowing your full setup, it's hard to help further.
>>>
>>> Jez
>>>
>>> On 20/08/14 08:57, Salvatore Di Nardo wrote:
>>>> Still problems. Here some more detailed examples:
>>>>
>>>> *EXAMPLE 1:*
>>>>
>>>> *EBI5-220**( CLIENT)**
>>>> *Tue Aug 19 11:03:04.980 2014: *Timed out waiting for a
>>>> reply from node <GSS02B IP> gss02b*
>>>> Tue Aug 19 11:03:04.981 2014: Request sent to <GSS02A
>>>> IP> (gss02a in GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>) to
>>>> expel <GSS02B IP> (gss02b in GSS.ebi.ac.uk
>>>> <http://GSS.ebi.ac.uk>) from cluster GSS.ebi.ac.uk
>>>> <http://GSS.ebi.ac.uk>
>>>> Tue Aug 19 11:03:04.982 2014: This node will be
>>>> expelled from cluster GSS.ebi.ac.uk
>>>> <http://GSS.ebi.ac.uk> due to expel msg from <EBI5-220
>>>> IP> (ebi5-220)
>>>> Tue Aug 19 11:03:09.319 2014: Cluster Manager
>>>> connection broke. Probing cluster GSS.ebi.ac.uk
>>>> <http://GSS.ebi.ac.uk>
>>>> Tue Aug 19 11:03:10.321 2014: Unable to contact any
>>>> quorum nodes during cluster probe.
>>>> Tue Aug 19 11:03:10.322 2014: Lost membership in
>>>> cluster GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>.
>>>> Unmounting file systems.
>>>> Tue Aug 19 11:03:10 BST 2014: mmcommon preunmount
>>>> invoked. File system: gpfs1 Reason: SGPanic
>>>> Tue Aug 19 11:03:12.066 2014: Connecting to <GSS02A IP>
>>>> gss02a <c1p687>
>>>> Tue Aug 19 11:03:12.070 2014: Connected to <GSS02A IP>
>>>> gss02a <c1p687>
>>>> Tue Aug 19 11:03:17.071 2014: Connecting to <GSS02B IP>
>>>> gss02b <c1p686>
>>>> Tue Aug 19 11:03:17.072 2014: Connecting to <GSS03B IP>
>>>> gss03b <c1p685>
>>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS03A IP>
>>>> gss03a <c1p684>
>>>> Tue Aug 19 11:03:17.080 2014: Connecting to <GSS01B IP>
>>>> gss01b <c1p683>
>>>> Tue Aug 19 11:03:17.079 2014: Connecting to <GSS01A IP>
>>>> gss01a <c1p1>
>>>> Tue Aug 19 11:04:23.105 2014: Connected to <GSS02B IP>
>>>> gss02b <c1p686>
>>>> Tue Aug 19 11:04:23.107 2014: Connected to <GSS03B IP>
>>>> gss03b <c1p685>
>>>> Tue Aug 19 11:04:23.112 2014: Connected to <GSS03A IP>
>>>> gss03a <c1p684>
>>>> Tue Aug 19 11:04:23.115 2014: Connected to <GSS01B IP>
>>>> gss01b <c1p683>
>>>> Tue Aug 19 11:04:23.121 2014: Connected to <GSS01A IP>
>>>> gss01a <c1p1>
>>>> Tue Aug 19 11:12:28.992 2014: Node <GSS02A IP> (gss02a
>>>> in GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>) is now the
>>>> Group Leader.
>>>>
>>>> *GSS02B ( NSD SERVER)*
>>>> ...
>>>> Tue Aug 19 11:03:17.070 2014: Killing connection from
>>>> *<EBI5-220 IP>* because the group is not ready for it
>>>> to rejoin, err 46
>>>> Tue Aug 19 11:03:25.016 2014: Killing connection from
>>>> <EBI5-102 IP> because the group is not ready for it to
>>>> rejoin, err 46
>>>> Tue Aug 19 11:03:28.080 2014: Killing connection from
>>>> *<EBI5-220 IP>* because the group is not ready for it
>>>> to rejoin, err 46
>>>> Tue Aug 19 11:03:36.019 2014: Killing connection from
>>>> <EBI5-102 IP> because the group is not ready for it to
>>>> rejoin, err 46
>>>> Tue Aug 19 11:03:39.083 2014: Killing connection from
>>>> *<EBI5-220 IP>* because the group is not ready for it
>>>> to rejoin, err 46
>>>> Tue Aug 19 11:03:47.023 2014: Killing connection from
>>>> <EBI5-102 IP> because the group is not ready for it to
>>>> rejoin, err 46
>>>> Tue Aug 19 11:03:50.088 2014: Killing connection from
>>>> *<EBI5-220 IP>* because the group is not ready for it
>>>> to rejoin, err 46
>>>> Tue Aug 19 11:03:52.218 2014: Killing connection from
>>>> <EBI5-043 IP> because the group is not ready for it to
>>>> rejoin, err 46
>>>> Tue Aug 19 11:03:58.030 2014: Killing connection from
>>>> <EBI5-102 IP> because the group is not ready for it to
>>>> rejoin, err 46
>>>> Tue Aug 19 11:04:01.092 2014: Killing connection from
>>>> *<EBI5-220 IP>* because the group is not ready for it
>>>> to rejoin, err 46
>>>> Tue Aug 19 11:04:03.220 2014: Killing connection from
>>>> <EBI5-043 IP> because the group is not ready for it to
>>>> rejoin, err 46
>>>> Tue Aug 19 11:04:09.034 2014: Killing connection from
>>>> <EBI5-102 IP> because the group is not ready for it to
>>>> rejoin, err 46
>>>> Tue Aug 19 11:04:12.096 2014: Killing connection from
>>>> *<EBI5-220 IP>* because the group is not ready for it
>>>> to rejoin, err 46
>>>> Tue Aug 19 11:04:14.224 2014: Killing connection from
>>>> <EBI5-043 IP> because the group is not ready for it to
>>>> rejoin, err 46
>>>> Tue Aug 19 11:04:20.037 2014: Killing connection from
>>>> <EBI5-102 IP> because the group is not ready for it to
>>>> rejoin, err 46
>>>> Tue Aug 19 11:04:23.103 2014: Accepted and connected to
>>>> *<EBI5-220 IP>* ebi5-220 <c0n618>
>>>> ...
>>>>
>>>> *GSS02a ( NSD SERVER)*
>>>> Tue Aug 19 11:03:04.980 2014: Expel <GSS02B IP>
>>>> (gss02b) request from <EBI5-220 IP> (ebi5-220 in
>>>> ebi-cluster.ebi.ac.uk <http://ebi-cluster.ebi.ac.uk>).
>>>> Expelling: <EBI5-220 IP> (ebi5-220 in
>>>> ebi-cluster.ebi.ac.uk <http://ebi-cluster.ebi.ac.uk>)
>>>> Tue Aug 19 11:03:12.069 2014: Accepted and connected to
>>>> <EBI5-220 IP> ebi5-220 <c0n618>
>>>>
>>>>
>>>> ===============================================
>>>> *EXAMPLE 2*:
>>>>
>>>> *EBI5-038*
>>>> Tue Aug 19 11:32:34.227 2014: *Disk lease period
>>>> expired in cluster GSS.ebi.ac.uk
>>>> <http://GSS.ebi.ac.uk>. Attempting to reacquire lease.*
>>>> Tue Aug 19 11:33:34.258 2014: *Lease is overdue.
>>>> Probing cluster GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>*
>>>> Tue Aug 19 11:35:24.265 2014: Close connection to
>>>> <GSS02A IP> gss02a <c1n2> (Connection reset by peer).
>>>> Attempting reconnect.
>>>> Tue Aug 19 11:35:24.865 2014: Close connection to
>>>> <EBI5-014 IP> ebi5-014 <c1n457> (Connection reset by
>>>> peer). Attempting reconnect.
>>>> ...
>>>> LOT MORE RESETS BY PEER
>>>> ...
>>>> Tue Aug 19 11:35:25.096 2014: Close connection to
>>>> <EBI5-167 IP> ebi5-167 <c1n155> (Connection reset by
>>>> peer). Attempting reconnect.
>>>> Tue Aug 19 11:35:25.267 2014: Connecting to <GSS02A IP>
>>>> gss02a <c1n2>
>>>> Tue Aug 19 11:35:25.268 2014: Close connection to
>>>> <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>> destination is still processing previous node failure)
>>>> Tue Aug 19 11:35:26.267 2014: Retry connection to
>>>> <GSS02A IP> gss02a <c1n2>
>>>> Tue Aug 19 11:35:26.268 2014: Close connection to
>>>> <GSS02A IP> gss02a <c1n2> (Connection failed because
>>>> destination is still processing previous node failure)
>>>> Tue Aug 19 11:36:24.276 2014: Unable to contact any
>>>> quorum nodes during cluster probe.
>>>> Tue Aug 19 11:36:24.277 2014: *Lost membership in
>>>> cluster GSS.ebi.ac.uk <http://GSS.ebi.ac.uk>.
>>>> Unmounting file systems.*
>>>>
>>>> *GSS02a*
>>>> Tue Aug 19 11:35:24.263 2014: Node <EBI5-038 IP>
>>>> (ebi5-038 in ebi-cluster.ebi.ac.uk
>>>> <http://ebi-cluster.ebi.ac.uk>) *is being expelled
>>>> because of an expired lease.* Pings sent: 60. Replies
>>>> received: 60.
>>>>
>>>>
>>>>
>>>>
>>>> In example 1 seems that an NSD was not repliyng to the client, but
>>>> the servers seems working fine.. how can i trace better ( to solve)
>>>> the problem?
>>>>
>>>> In example 2 it seems to me that for some reason the manager are
>>>> not renewing the lease in time. when this happens , its not a
>>>> single client.
>>>> Loads of them fail to get the lease renewed. Why this is happening?
>>>> how can i trace to the source of the problem?
>>>>
>>>>
>>>>
>>>> Thanks in advance for any tips.
>>>>
>>>> Regards,
>>>> Salvatore
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> gpfsug-discuss mailing list
>>>> gpfsug-discuss atgpfsug.org <http://gpfsug.org>
>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>>
>>>
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss atgpfsug.org <http://gpfsug.org>
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at gpfsug.org <http://gpfsug.org>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at gpfsug.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20140821/bf1a6c40/attachment.htm>
More information about the gpfsug-discuss
mailing list