[gpfsug-discuss] NFS issues
Peter Serocka
peserocka at gmail.com
Wed Apr 26 18:53:51 BST 2017
> On 2017 Apr 26 Wed, at 16:20, Simon Thompson (IT Research Support) <S.J.Thompson at bham.ac.uk> wrote:
>
> Nope, the clients are all L3 connected, so not an arp issue.
...not on the client, but the server-facing L3 switch
still need to manage its ARP table, and might miss
the IP moving to a new MAC.
Cisco switches have a default ARP cache timeout of 4 hours, fwiw.
Can your network team provide you the ARP status
from the switch when you see a fail-over being stuck?
— Peter
>
> Two things we have observed:
>
> 1. It triggers when one of the CES IPs moves and quickly moves back again.
> The move occurs because the NFS server goes into grace:
>
> 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
> GRACE, duration 60
> 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server
> recovery event 2 nodeid -1 ip <CESIP>
> 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs_release_v4_client :STATE :EVENT :NFS Server V4
> recovery release ip <CESIP>
> 2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs_in_grace :STATE :EVENT :NFS Server Now IN GRACE
> 2017-04-25 20:37:42 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
> GRACE, duration 60
> 2017-04-25 20:37:44 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
> GRACE, duration 60
> 2017-04-25 20:37:44 : epoch 00040183 : <NODENAME> :
> ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server
> recovery event 4 nodeid 2 ip
>
>
>
> We can't see in any of the logs WHY ganesha is going into grace. Any
> suggestions on how to debug this further? (I.e. If we can stop the grace
> issues, we can solve the problem mostly).
>
>
> 2. Our clients are using LDAP which is bound to the CES IPs. If we
> shutdown nslcd on the client we can get the client to recover once all the
> TIME_WAIT connections have gone. Maybe this was a bad choice on our side
> to bind to the CES IPs - we figured it would handily move the IPs for us,
> but I guess the mmcesfuncs isn't aware of this and so doesn't kill the
> connections to the IP as it goes away.
>
>
> So two approaches we are going to try. Reconfigure the nslcd on a couple
> of clients and see if they still show up the issues when fail-over occurs.
> Second is to work out why the NFS servers are going into grace in the
> first place.
>
> Simon
>
> On 26/04/2017, 00:46, "gpfsug-discuss-bounces at spectrumscale.org on behalf
> of Greg.Lehmann at csiro.au" <gpfsug-discuss-bounces at spectrumscale.org on
> behalf of Greg.Lehmann at csiro.au> wrote:
>
>> Are you using infiniband or Ethernet? I'm wondering if IBM have solved
>> the gratuitous arp issue which we see with our non-protocols NFS
>> implementation.
>>
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at spectrumscale.org
>> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon
>> Thompson (IT Research Support)
>> Sent: Wednesday, 26 April 2017 3:31 AM
>> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>> Subject: Re: [gpfsug-discuss] NFS issues
>>
>> I did some digging in the mmcesfuncs to see what happens server side on
>> fail over.
>>
>> Basically the server losing the IP is supposed to terminate all sessions
>> and the receiver server sends ACK tickles.
>>
>> My current supposition is that for whatever reason, the losing server
>> isn't releasing something and the client still has hold of a connection
>> which is mostly dead. The tickle then fails to the client from the new
>> server.
>>
>> This would explain why failing the IP back to the original server usually
>> brings the client back to life.
>>
>> This is only my working theory at the moment as we can't reliably
>> reproduce this. Next time it happens we plan to grab some netstat from
>> each side.
>>
>> Then we plan to issue "mmcmi tcpack $cesIpPort $clientIpPort" on the
>> server that received the IP and see if that fixes it (i.e. the receiver
>> server didn't tickle properly). (Usage extracted from mmcesfuncs which is
>> ksh of course). ... CesIPPort is colon separated IP:portnumber (of NFSd)
>> for anyone interested.
>>
>> Then try and kill he sessions on the losing server to check if there is
>> stuff still open and re-tickle the client.
>>
>> If we can get steps to workaround, I'll log a PMR. I suppose I could do
>> that now, but given its non deterministic and we want to be 100% sure
>> it's not us doing something wrong, I'm inclined to wait until we do some
>> more testing.
>>
>> I agree with the suggestion that it's probably IO pending nodes that are
>> affected, but don't have any data to back that up yet. We did try with a
>> read workload on a client, but may we need either long IO blocked reads
>> or writes (from the GPFS end).
>>
>> We also originally had soft as the default option, but saw issues then
>> and the docs suggested hard, so we switched and also enabled sync (we
>> figured maybe it was NFS client with uncommited writes), but neither have
>> resolved the issues entirely. Difficult for me to say if they improved
>> the issue though given its sporadic.
>>
>> Appreciate people's suggestions!
>>
>> Thanks
>>
>> Simon
>> ________________________________________
>> From: gpfsug-discuss-bounces at spectrumscale.org
>> [gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jan-Frode
>> Myklebust [janfrode at tanso.net]
>> Sent: 25 April 2017 18:04
>> To: gpfsug main discussion list
>> Subject: Re: [gpfsug-discuss] NFS issues
>>
>> I *think* I've seen this, and that we then had open TCP connection from
>> client to NFS server according to netstat, but these connections were not
>> visible from netstat on NFS-server side.
>>
>> Unfortunately I don't remember what the fix was..
>>
>>
>>
>> -jf
>>
>> tir. 25. apr. 2017 kl. 16.06 skrev Simon Thompson (IT Research Support)
>> <S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>>:
>> Hi,
>>
>> From what I can see, Ganesha uses the Export_Id option in the config file
>> (which is managed by CES) for this. I did find some reference in the
>> Ganesha devs list that if its not set, then it would read the FSID from
>> the GPFS file-system, either way they should surely be consistent across
>> all the nodes. The posts I found were from someone with an IBM email
>> address, so I guess someone in the IBM teams.
>>
>> I checked a couple of my protocol nodes and they use the same Export_Id
>> consistently, though I guess that might not be the same as the FSID value.
>>
>> Perhaps someone from IBM could comment on if FSID is likely to the cause
>> of my problems?
>>
>> Thanks
>>
>> Simon
>>
>> On 25/04/2017, 14:51,
>> "gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at sp
>> ectrumscale.org> on behalf of Ouwehand, JJ"
>> <gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at sp
>> ectrumscale.org> on behalf of
>> j.ouwehand at vumc.nl<mailto:j.ouwehand at vumc.nl>> wrote:
>>
>>> Hello,
>>>
>>> At first a short introduction. My name is Jaap Jan Ouwehand, I work at
>>> a Dutch hospital "VU Medical Center" in Amsterdam. We make daily use of
>>> IBM Spectrum Scale, Spectrum Archive and Spectrum Protect in our
>>> critical (office, research and clinical data) business process. We have
>>> three large GPFS filesystems for different purposes.
>>>
>>> We also had such a situation with cNFS. A failover (IPtakeover) was
>>> technically good, only clients experienced "stale filehandles". We
>>> opened a PMR at IBM and after testing, deliver logs, tcpdumps and a few
>>> months later, the solution appeared to be in the fsid option.
>>>
>>> An NFS filehandle is built by a combination of fsid and a hash function
>>> on the inode. After a failover, the fsid value can be different and the
>>> client has a "stale filehandle". To avoid this, the fsid value can be
>>> statically specified. See:
>>>
>>> https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum
>>> .
>>> scale.v4r22.doc/bl1adm_nfslin.htm
>>>
>>> Maybe there is also a value in Ganesha that changes after a failover.
>>> Certainly since most sessions will be re-established after a failback.
>>> Maybe you see more debug information with tcpdump.
>>>
>>>
>>> Kind regards,
>>>
>>> Jaap Jan Ouwehand
>>> ICT Specialist (Storage & Linux)
>>> VUmc - ICT
>>> E: jj.ouwehand at vumc.nl<mailto:jj.ouwehand at vumc.nl>
>>> W: www.vumc.com<http://www.vumc.com>
>>>
>>>
>>>
>>> -----Oorspronkelijk bericht-----
>>> Van:
>>> gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces@
>>> spectrumscale.org>
>>> [mailto:gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-
>>> bounces at spectrumscale.org>] Namens Simon Thompson (IT Research Support)
>>> Verzonden: dinsdag 25 april 2017 13:21
>>> Aan:
>>> gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.or
>>> g>
>>> Onderwerp: [gpfsug-discuss] NFS issues
>>>
>>> Hi,
>>>
>>> We have recently started deploying NFS in addition our existing SMB
>>> exports on our protocol nodes.
>>>
>>> We use a RR DNS name that points to 4 VIPs for SMB services and
>>> failover seems to work fine with SMB clients. We figured we could use
>>> the same name and IPs and run Ganesha on the protocol servers, however
>>> we are seeing issues with NFS clients when IP failover occurs.
>>>
>>> In normal operation on a client, we might see several mounts from
>>> different IPs obviously due to the way the DNS RR is working, but it
>>> all works fine.
>>>
>>> In a failover situation, the IP will move to another node and some
>>> clients will carry on, others will hang IO to the mount points referred
>>> to by the IP which has moved. We can *sometimes* trigger this by
>>> manually suspending a CES node, but not always and some clients
>>> mounting from the IP moving will be fine, others won't.
>>>
>>> If we resume a node an it fails back, the clients that are hanging will
>>> usually recover fine. We can reboot a client prior to failback and it
>>> will be fine, stopping and starting the ganesha service on a protocol
>>> node will also sometimes resolve the issues.
>>>
>>> So, has anyone seen this sort of issue and any suggestions for how we
>>> could either debug more or workaround?
>>>
>>> We are currently running the packages
>>> nfs-ganesha-2.3.2-0.ibm32_1.el7.x86_64 (4.2.2-2 release ones).
>>>
>>> At one point we were seeing it a lot, and could track it back to an
>>> underlying GPFS network issue that was causing protocol nodes to be
>>> expelled occasionally, we resolved that and the issues became less
>>> apparent, but maybe we just fixed one failure mode so see it less often.
>>>
>>> On the clients, we use -o sync,hard BTW as in the IBM docs.
>>>
>>> On a client showing the issues, we'll see in dmesg, NFS related
>>> messages
>>> like:
>>> [Wed Apr 12 16:59:53 2017] nfs: server
>>> MYNFSSERVER.bham.ac.uk<http://MYNFSSERVER.bham.ac.uk> not responding,
>>> timed out
>>>
>>> Which explains the client hang on certain mount points.
>>>
>>> The symptoms feel very much like those logged in this Gluster/ganesha
>>> bug:
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1354439
>>>
>>>
>>> Thanks
>>>
>>> Simon
>>>
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>> _______________________________________________
>>> gpfsug-discuss mailing list
>>> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
More information about the gpfsug-discuss
mailing list