[gpfsug-discuss] help with multi-cluster setup: Network is unreachable

Simon Thompson (IT Research Support) S.J.Thompson at bham.ac.uk
Mon May 8 17:12:35 BST 2017


Do you have multiple networks on the hosts? We've seen this sort of thing when rp_filter is dropping traffic with asynchronous routing.

I know you said it's set to only go over IB, but if you have names that resolve onto you Ethernet, and admin name etc are not correct, it might be your problem.

If you had 4.2, I'd suggest mmnetverify. I suppose that might work if you copied it out of the 4.x packages anyway?

Simon 
________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of pinto at scinet.utoronto.ca [pinto at scinet.utoronto.ca]
Sent: 08 May 2017 17:06
To: gpfsug main discussion list
Subject: [gpfsug-discuss] help with multi-cluster setup: Network is     unreachable

We have a setup in which "cluster 0" is made up of clients only on
gpfs v3.5, ie, no NDS's or formal storage on this primary membership.

All storage for those clients come in a multi-cluster fashion, from
clusters 1 (3.5.0-23), 2 (3.5.0-11) and 3 (4.1.1-7).

We recently added a new storage cluster 4 (4.1.1-14), and for some
obscure reason we keep getting "Network is unreachable" during mount
by clients, even though there were no issues or errors with the
multi-cluster setup, ie, 'mmremotecluster add' and 'mmremotefs add'
worked fine, and all clients have an entry in /etc/fstab for the file
system associated with the new cluster 4. The weird thing is that we
can mount cluster 3 fine (also 4.1).

Another piece og information is that as far as GPFS goes all clusters
are configured to communicate exclusively over Infiniband, each on a
different 10.20.x.x network, but broadcast 10.20.255.255. As far as
the IB network goes there are no problems routing/pinging around all
the clusters. So this must be internal to GPFS.

None of the clusters have the subnet parameter set explicitly at
configuration, and on reading the 3.5 and 4.1 manuals it doesn't seem
we need to. All have cipherList AUTHONLY. One difference is that
cluster 4 has DMAPI enabled (don't think it matters).

Below is an excerpt of the /var/mmfs/gen/mmfslog in one of the clients
during mount (10.20.179.1 is one of the NDS on cluster 4):
Mon May  8 11:35:27.773 2017: [I] Waiting to join remote cluster
wosgpfs.wos-gateway01-ib0
Mon May  8 11:35:28.777 2017: [W] The TLS handshake with node
10.20.179.1 failed with error 447 (client side).
Mon May  8 11:35:28.781 2017: [E] Failed to join remote cluster
wosgpfs.wos-gateway01-ib0
Mon May  8 11:35:28.782 2017: [W] Command: err 719: mount
wosgpfs.wos-gateway01-ib0:wosgpfs
Mon May  8 11:35:28.783 2017: Network is unreachable


I see this reference to "TLS handshake" and error 447, however
according to the manual this TLS is only set to be default on 4.2
onwards, not 4.1.1-14 that we have now, where it's supposed to be EMPTY.

mmdiag --network for some of the client gives this excerpt (broken status):
     tapenode-ib0                        <c4p1>   10.20.83.5
broken     233  -1    0         0          Linux/L
     gpc-f114n014-ib0                    <c4p2>   10.20.114.14
broken     233  -1    0         0          Linux/L
     gpc-f114n015-ib0                    <c4p3>   10.20.114.15
broken     233  -1    0         0          Linux/L
     gpc-f114n016-ib0                    <c4p4>   10.20.114.16
broken     233  -1    0         0          Linux/L
     wos-gateway01-ib0                   <c4p5>   10.20.179.1
broken     233  -1    0         0          Linux/L



I guess I just need a hint on how to troubleshoot this situation (the
4.1 troubleshoot guide is not helping).

Thanks
Jaime



---
Jaime Pinto
SciNet HPC Consortium - Compute/Calcul Canada
www.scinet.utoronto.ca - www.computecanada.ca
University of Toronto
661 University Ave. (MaRS), Suite 1140
Toronto, ON, M5G1M1
P: 416-978-2755
C: 416-505-1477

----------------------------------------------------------------
This message was sent using IMP at SciNet Consortium, University of Toronto.

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



More information about the gpfsug-discuss mailing list