[gpfsug-discuss] NFS issues

Simon Thompson (IT Research Support) S.J.Thompson at bham.ac.uk
Wed Apr 26 15:20:30 BST 2017


Nope, the clients are all L3 connected, so not an arp issue.

Two things we have observed:

1. It triggers when one of the CES IPs moves and quickly moves back again.
The move occurs because the NFS server goes into grace:

2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
GRACE, duration 60
2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server
recovery event 2 nodeid -1 ip <CESIP>
2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
ganesha.nfsd-1261[dbus] nfs_release_v4_client :STATE :EVENT :NFS Server V4
recovery release ip <CESIP>
2017-04-25 20:36:49 : epoch 00040183 : <NODENAME> :
ganesha.nfsd-1261[dbus] nfs_in_grace :STATE :EVENT :NFS Server Now IN GRACE
2017-04-25 20:37:42 : epoch 00040183 : <NODENAME> :
ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
GRACE, duration 60
2017-04-25 20:37:44 : epoch 00040183 : <NODENAME> :
ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server Now IN
GRACE, duration 60
2017-04-25 20:37:44 : epoch 00040183 : <NODENAME> :
ganesha.nfsd-1261[dbus] nfs4_start_grace :STATE :EVENT :NFS Server
recovery event 4 nodeid 2 ip



We can't see in any of the logs WHY ganesha is going into grace. Any
suggestions on how to debug this further? (I.e. If we can stop the grace
issues, we can solve the problem mostly).


2. Our clients are using LDAP which is bound to the CES IPs. If we
shutdown nslcd on the client we can get the client to recover once all the
TIME_WAIT connections have gone. Maybe this was a bad choice on our side
to bind to the CES IPs - we figured it would handily move the IPs for us,
but I guess the mmcesfuncs isn't aware of this and so doesn't kill the
connections to the IP as it goes away.


So two approaches we are going to try. Reconfigure the nslcd on a couple
of clients and see if they still show up the issues when fail-over occurs.
Second is to work out why the NFS servers are going into grace in the
first place.

Simon

On 26/04/2017, 00:46, "gpfsug-discuss-bounces at spectrumscale.org on behalf
of Greg.Lehmann at csiro.au" <gpfsug-discuss-bounces at spectrumscale.org on
behalf of Greg.Lehmann at csiro.au> wrote:

>Are you using infiniband or Ethernet? I'm wondering if IBM have solved
>the gratuitous arp issue which we see with our non-protocols NFS
>implementation.
>
>-----Original Message-----
>From: gpfsug-discuss-bounces at spectrumscale.org
>[mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Simon
>Thompson (IT Research Support)
>Sent: Wednesday, 26 April 2017 3:31 AM
>To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>Subject: Re: [gpfsug-discuss] NFS issues
>
>I did some digging in the mmcesfuncs to see what happens server side on
>fail over.
>
>Basically the server losing the IP is supposed to terminate all sessions
>and the receiver server sends ACK tickles.
>
>My current supposition is that for whatever reason, the losing server
>isn't releasing something and the client still has hold of a connection
>which is mostly dead. The tickle then fails to the client from the new
>server.
>
>This would explain why failing the IP back to the original server usually
>brings the client back to life.
>
>This is only my working theory at the moment as we can't reliably
>reproduce this. Next time it happens we plan to grab some netstat from
>each side. 
>
>Then we plan to issue "mmcmi tcpack $cesIpPort $clientIpPort" on the
>server that received the IP and see if that fixes it (i.e. the receiver
>server didn't tickle properly). (Usage extracted from mmcesfuncs which is
>ksh of course). ... CesIPPort is colon separated IP:portnumber (of NFSd)
>for anyone interested.
>
>Then try and kill he sessions on the losing server to check if there is
>stuff still open and re-tickle the client.
>
>If we can get steps to workaround, I'll log a PMR. I suppose I could do
>that now, but given its non deterministic and we want to be 100% sure
>it's not us doing something wrong, I'm inclined to wait until we do some
>more testing.
>
>I agree with the suggestion that it's probably IO pending nodes that are
>affected, but don't have any data to back that up yet. We did try with a
>read workload on a client, but may we need either long IO blocked reads
>or writes (from the GPFS end).
>
>We also originally had soft as the default option, but saw issues then
>and the docs suggested hard, so we switched and also enabled sync (we
>figured maybe it was NFS client with uncommited writes), but neither have
>resolved the issues entirely. Difficult for me to say if they improved
>the issue though given its sporadic.
>
>Appreciate people's suggestions!
>
>Thanks
>
>Simon
>________________________________________
>From: gpfsug-discuss-bounces at spectrumscale.org
>[gpfsug-discuss-bounces at spectrumscale.org] on behalf of Jan-Frode
>Myklebust [janfrode at tanso.net]
>Sent: 25 April 2017 18:04
>To: gpfsug main discussion list
>Subject: Re: [gpfsug-discuss] NFS issues
>
>I *think* I've seen this, and that we then had open TCP connection from
>client to NFS server according to netstat, but these connections were not
>visible from netstat on NFS-server side.
>
>Unfortunately I don't remember what the fix was..
>
>
>
>  -jf
>
>tir. 25. apr. 2017 kl. 16.06 skrev Simon Thompson (IT Research Support)
><S.J.Thompson at bham.ac.uk<mailto:S.J.Thompson at bham.ac.uk>>:
>Hi,
>
>From what I can see, Ganesha uses the Export_Id option in the config file
>(which is managed by CES) for this. I did find some reference in the
>Ganesha devs list that if its not set, then it would read the FSID from
>the GPFS file-system, either way they should surely be consistent across
>all the nodes. The posts I found were from someone with an IBM email
>address, so I guess someone in the IBM teams.
>
>I checked a couple of my protocol nodes and they use the same Export_Id
>consistently, though I guess that might not be the same as the FSID value.
>
>Perhaps someone from IBM could comment on if FSID is likely to the cause
>of my problems?
>
>Thanks
>
>Simon
>
>On 25/04/2017, 14:51,
>"gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at sp
>ectrumscale.org> on behalf of Ouwehand, JJ"
><gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces at sp
>ectrumscale.org> on behalf of
>j.ouwehand at vumc.nl<mailto:j.ouwehand at vumc.nl>> wrote:
>
>>Hello,
>>
>>At first a short introduction. My name is Jaap Jan Ouwehand, I work at
>>a Dutch hospital "VU Medical Center" in Amsterdam. We make daily use of
>>IBM Spectrum Scale, Spectrum Archive and Spectrum Protect in our
>>critical (office, research and clinical data) business process. We have
>>three large GPFS filesystems for different purposes.
>>
>>We also had such a situation with cNFS. A failover (IPtakeover) was
>>technically good, only clients experienced "stale filehandles". We
>>opened a PMR at IBM and after testing, deliver logs, tcpdumps and a few
>>months later, the solution appeared to be in the fsid option.
>>
>>An NFS filehandle is built by a combination of fsid and a hash function
>>on the inode. After a failover, the fsid value can be different and the
>>client has a "stale filehandle". To avoid this, the fsid value can be
>>statically specified. See:
>>
>>https://www.ibm.com/support/knowledgecenter/STXKQY_4.2.2/com.ibm.spectrum
>>.
>>scale.v4r22.doc/bl1adm_nfslin.htm
>>
>>Maybe there is also a value in Ganesha that changes after a failover.
>>Certainly since most sessions will be re-established after a failback.
>>Maybe you see more debug information with tcpdump.
>>
>>
>>Kind regards,
>>
>>Jaap Jan Ouwehand
>>ICT Specialist (Storage & Linux)
>>VUmc - ICT
>>E: jj.ouwehand at vumc.nl<mailto:jj.ouwehand at vumc.nl>
>>W: www.vumc.com<http://www.vumc.com>
>>
>>
>>
>>-----Oorspronkelijk bericht-----
>>Van: 
>>gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-bounces@
>>spectrumscale.org>
>>[mailto:gpfsug-discuss-bounces at spectrumscale.org<mailto:gpfsug-discuss-
>>bounces at spectrumscale.org>] Namens Simon Thompson (IT Research Support)
>>Verzonden: dinsdag 25 april 2017 13:21
>>Aan: 
>>gpfsug-discuss at spectrumscale.org<mailto:gpfsug-discuss at spectrumscale.or
>>g>
>>Onderwerp: [gpfsug-discuss] NFS issues
>>
>>Hi,
>>
>>We have recently started deploying NFS in addition our existing SMB
>>exports on our protocol nodes.
>>
>>We use a RR DNS name that points to 4 VIPs for SMB services and
>>failover seems to work fine with SMB clients. We figured we could use
>>the same name and IPs and run Ganesha on the protocol servers, however
>>we are seeing issues with NFS clients when IP failover occurs.
>>
>>In normal operation on a client, we might see several mounts from
>>different IPs obviously due to the way the DNS RR is working, but it
>>all works fine.
>>
>>In a failover situation, the IP will move to another node and some
>>clients will carry on, others will hang IO to the mount points referred
>>to by the IP which has moved. We can *sometimes* trigger this by
>>manually suspending a CES node, but not always and some clients
>>mounting from the IP moving will be fine, others won't.
>>
>>If we resume a node an it fails back, the clients that are hanging will
>>usually recover fine. We can reboot a client prior to failback and it
>>will be fine, stopping and starting the ganesha service on a protocol
>>node will also sometimes resolve the issues.
>>
>>So, has anyone seen this sort of issue and any suggestions for how we
>>could either debug more or workaround?
>>
>>We are currently running the packages
>>nfs-ganesha-2.3.2-0.ibm32_1.el7.x86_64 (4.2.2-2 release ones).
>>
>>At one point we were seeing it a lot, and could track it back to an
>>underlying GPFS network issue that was causing protocol nodes to be
>>expelled occasionally, we resolved that and the issues became less
>>apparent, but maybe we just fixed one failure mode so see it less often.
>>
>>On the clients, we use -o sync,hard BTW as in the IBM docs.
>>
>>On a client showing the issues, we'll see in dmesg, NFS related
>>messages
>>like:
>>[Wed Apr 12 16:59:53 2017] nfs: server
>>MYNFSSERVER.bham.ac.uk<http://MYNFSSERVER.bham.ac.uk> not responding,
>>timed out
>>
>>Which explains the client hang on certain mount points.
>>
>>The symptoms feel very much like those logged in this Gluster/ganesha
>>bug:
>>https://bugzilla.redhat.com/show_bug.cgi?id=1354439
>>
>>
>>Thanks
>>
>>Simon
>>
>>_______________________________________________
>>gpfsug-discuss mailing list
>>gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
>>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>_______________________________________________
>>gpfsug-discuss mailing list
>>gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
>>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>_______________________________________________
>gpfsug-discuss mailing list
>gpfsug-discuss at spectrumscale.org
>http://gpfsug.org/mailman/listinfo/gpfsug-discuss




More information about the gpfsug-discuss mailing list