[gpfsug-discuss] nodes being ejected out of the cluster
Damir Krstic
damir.krstic at gmail.com
Wed Jan 11 17:53:50 GMT 2017
Thanks for all the suggestions. Here is our mmlsconfig file. We just
purchased another GL6. During the installation of the new GL6 IBM will
upgrade our existing GL6 up to the latest code levels. This will happen
during the week of 23rd of Jan.
I am skeptical that the upgrade is going to fix the issue.
On our IO servers we are running in connected mode (please note that IB
interfaces are bonded)
[root at gssio1 ~]# cat /sys/class/net/ib0/mode
connected
[root at gssio1 ~]# cat /sys/class/net/ib1/mode
connected
[root at gssio1 ~]# cat /sys/class/net/ib2/mode
connected
[root at gssio1 ~]# cat /sys/class/net/ib3/mode
connected
[root at gssio2 ~]# cat /sys/class/net/ib0/mode
connected
[root at gssio2 ~]# cat /sys/class/net/ib1/mode
connected
[root at gssio2 ~]# cat /sys/class/net/ib2/mode
connected
[root at gssio2 ~]# cat /sys/class/net/ib3/mode
connected
Our login nodes are also running connected mode as well.
However, all of our compute nodes are running in datagram:
[root at mgt ~]# psh compute cat /sys/class/net/ib0/mode
qnode0758: datagram
qnode0763: datagram
qnode0760: datagram
qnode0772: datagram
qnode0773: datagram
....etc.
Here is our mmlsconfig:
[root at gssio1 ~]# mmlsconfig
Configuration data for cluster ess-qstorage.it.northwestern.edu:
----------------------------------------------------------------
clusterName ess-qstorage.it.northwestern.edu
clusterId 17746506346828356609
dmapiFileHandleSize 32
minReleaseLevel 4.2.0.1
ccrEnabled yes
cipherList AUTHONLY
[gss_ppc64]
nsdRAIDBufferPoolSizePct 80
maxBufferDescs 2m
prefetchPct 5
nsdRAIDTracks 128k
nsdRAIDSmallBufferSize 256k
nsdMaxWorkerThreads 3k
nsdMinWorkerThreads 3k
nsdRAIDSmallThreadRatio 2
nsdRAIDThreadsPerQueue 16
nsdRAIDEventLogToConsole all
nsdRAIDFastWriteFSDataLimit 256k
nsdRAIDFastWriteFSMetadataLimit 1M
nsdRAIDReconstructAggressiveness 1
nsdRAIDFlusherBuffersLowWatermarkPct 20
nsdRAIDFlusherBuffersLimitPct 80
nsdRAIDFlusherTracksLowWatermarkPct 20
nsdRAIDFlusherTracksLimitPct 80
nsdRAIDFlusherFWLogHighWatermarkMB 1000
nsdRAIDFlusherFWLogLimitMB 5000
nsdRAIDFlusherThreadsLowWatermark 1
nsdRAIDFlusherThreadsHighWatermark 512
nsdRAIDBlockDeviceMaxSectorsKB 8192
nsdRAIDBlockDeviceNrRequests 32
nsdRAIDBlockDeviceQueueDepth 16
nsdRAIDBlockDeviceScheduler deadline
nsdRAIDMaxTransientStale2FT 1
nsdRAIDMaxTransientStale3FT 1
nsdMultiQueue 512
syncWorkerThreads 256
nsdInlineWriteMax 32k
maxGeneralThreads 1280
maxReceiverThreads 128
nspdQueues 64
[common]
maxblocksize 16m
[ems1-fdr,compute,gss_ppc64]
numaMemoryInterleave yes
[gss_ppc64]
maxFilesToCache 12k
[ems1-fdr,compute]
maxFilesToCache 128k
[ems1-fdr,compute,gss_ppc64]
flushedDataTarget 1024
flushedInodeTarget 1024
maxFileCleaners 1024
maxBufferCleaners 1024
logBufferCount 20
logWrapAmountPct 2
logWrapThreads 128
maxAllocRegionsPerNode 32
maxBackgroundDeletionThreads 16
maxInodeDeallocPrefetch 128
[gss_ppc64]
maxMBpS 16000
[ems1-fdr,compute]
maxMBpS 10000
[ems1-fdr,compute,gss_ppc64]
worker1Threads 1024
worker3Threads 32
[gss_ppc64]
ioHistorySize 64k
[ems1-fdr,compute]
ioHistorySize 4k
[gss_ppc64]
verbsRdmaMinBytes 16k
[ems1-fdr,compute]
verbsRdmaMinBytes 32k
[ems1-fdr,compute,gss_ppc64]
verbsRdmaSend yes
[gss_ppc64]
verbsRdmasPerConnection 16
[ems1-fdr,compute]
verbsRdmasPerConnection 256
[gss_ppc64]
verbsRdmasPerNode 3200
[ems1-fdr,compute]
verbsRdmasPerNode 1024
[ems1-fdr,compute,gss_ppc64]
verbsSendBufferMemoryMB 1024
verbsRdmasPerNodeOptimize yes
verbsRdmaUseMultiCqThreads yes
[ems1-fdr,compute]
ignorePrefetchLUNCount yes
[gss_ppc64]
scatterBufferSize 256K
[ems1-fdr,compute]
scatterBufferSize 256k
syncIntervalStrict yes
[ems1-fdr,compute,gss_ppc64]
nsdClientCksumTypeLocal ck64
nsdClientCksumTypeRemote ck64
[gss_ppc64]
pagepool 72856M
[ems1-fdr]
pagepool 17544M
[compute]
pagepool 4g
[ems1-fdr,qsched03-ib0,quser10-fdr,compute,gss_ppc64]
verbsRdma enable
[gss_ppc64]
verbsPorts mlx5_0/1 mlx5_0/2 mlx5_1/1 mlx5_1/2
[ems1-fdr]
verbsPorts mlx5_0/1 mlx5_0/2
[qsched03-ib0,quser10-fdr,compute]
verbsPorts mlx4_0/1
[common]
autoload no
[ems1-fdr,compute,gss_ppc64]
maxStatCache 0
[common]
envVar MLX4_USE_MUTEX=1 MLX5_SHUT_UP_BF=1 MLX5_USE_MUTEX=1
deadlockOverloadThreshold 0
deadlockDetectionThreshold 0
adminMode central
File systems in cluster ess-qstorage.it.northwestern.edu:
---------------------------------------------------------
/dev/home
/dev/hpc
/dev/projects
/dev/tthome
On Wed, Jan 11, 2017 at 9:16 AM Luis Bolinches <luis.bolinches at fi.ibm.com>
wrote:
> In addition to what Olaf has said
>
> ESS upgrades include mellanox modules upgrades in the ESS nodes. In fact,
> on those noes you should do not update those solo (unless support says so
> in your PMR), so if that's been the recommendation, I suggest you look at
> it.
>
> Changelog on ESS 4.0.4 (no idea what ESS level you are running)
>
>
> c) Support of MLNX_OFED_LINUX-3.2-2.0.0.1
> - Updated from MLNX_OFED_LINUX-3.1-1.0.6.1 (ESS 4.0, 4.0.1, 4.0.2)
> - Updated from MLNX_OFED_LINUX-3.1-1.0.0.2 (ESS 3.5.x)
> - Updated from MLNX_OFED_LINUX-2.4-1.0.2 (ESS 3.0.x)
> - Support for PCIe3 LP 2-port 100 Gb EDR InfiniBand adapter x16 (FC EC3E)
> - Requires System FW level FW840.20 (SV840_104)
> - No changes from ESS 4.0.3
>
>
> --
> Ystävällisin terveisin / Kind regards / Saludos cordiales / Salutations
>
> Luis Bolinches
> Lab Services
> http://www-03.ibm.com/systems/services/labservices/
>
> IBM Laajalahdentie 23 (main Entrance) Helsinki, 00330 Finland
> Phone: +358 503112585 <+358%2050%203112585>
>
> "If you continually give you will continually have." Anonymous
>
>
>
> ----- Original message -----
> From: "Olaf Weiser" <olaf.weiser at de.ibm.com>
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
>
> Cc:
> Subject: Re: [gpfsug-discuss] nodes being ejected out of the cluster
> Date: Wed, Jan 11, 2017 5:03 PM
>
> most likely, there's smth wrong with your IB fabric ...
> you say, you run ~ 700 nodes ? ...
> Are you running with *verbsRdmaSend*enabled ? ,if so, please consider to
> disable - and discuss this within the PMR
> another issue, you may check is - Are you running the IPoIB in connected
> mode or datagram ... but as I said, please discuss this within the PMR ..
> there are to much dependencies to discuss this here ..
>
>
> cheers
>
>
> Mit freundlichen Grüßen / Kind regards
>
>
> Olaf Weiser
>
> EMEA Storage Competence Center Mainz, German / IBM Systems, Storage
> Platform,
>
> -------------------------------------------------------------------------------------------------------------------------------------------
> IBM Deutschland
> IBM Allee 1
> 71139 Ehningen
> Phone: +49-170-579-44-66 <+49%20170%205794466>
> E-Mail: olaf.weiser at de.ibm.com
>
> -------------------------------------------------------------------------------------------------------------------------------------------
> IBM Deutschland GmbH / Vorsitzender des Aufsichtsrats: Martin Jetter
> Geschäftsführung: Martina Koederitz (Vorsitzende), Susanne Peter, Norbert
> Janzen, Dr. Christian Keller, Ivo Koerner, Markus Koerner
> Sitz der Gesellschaft: Ehningen / Registergericht: Amtsgericht Stuttgart,
> HRB 14562 / WEEE-Reg.-Nr. DE 99369940
>
>
>
> From: Damir Krstic <damir.krstic at gmail.com>
> To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> Date: 01/11/2017 03:39 PM
> Subject: [gpfsug-discuss] nodes being ejected out of the cluster
> Sent by: gpfsug-discuss-bounces at spectrumscale.org
> ------------------------------
>
>
>
> We are running GPFS 4.2 on our cluster (around 700 compute nodes). Our
> storage (ESS GL6) is also running GPFS 4.2. Compute nodes and storage are
> connected via Infiniband (FDR14). At the time of implementation of ESS, we
> were instructed to enable RDMA in addition to IPoIB. Previously we only ran
> IPoIB on our GPFS3.5 cluster.
>
> Every since the implementation (sometime back in July of 2016) we see a
> lot of compute nodes being ejected. What usually precedes the ejection are
> following messages:
>
> Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA rdma send error
> IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum
> 0 vendor_err 135
> Jan 11 02:03:15 quser13 mmfs: [E] VERBS RDMA closed connection to
> 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error
> IBV_WC_RNR_RETRY_EXC_ERR index 2
> Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error
> IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum
> 0 vendor_err 135
> Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to
> 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error
> IBV_WC_WR_FLUSH_ERR index 1
> Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA rdma send error
> IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum
> 0 vendor_err 135
> Jan 11 02:03:26 quser13 mmfs: [E] VERBS RDMA closed connection to
> 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error
> IBV_WC_RNR_RETRY_EXC_ERR index 2
> Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA rdma send error
> IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum
> 0 vendor_err 135
> Jan 11 02:06:38 quser11 mmfs: [E] VERBS RDMA closed connection to
> 172.41.2.5 (gssio2-fdr) on mlx4_0 port 1 fabnum 0 due to send error
> IBV_WC_WR_FLUSH_ERR index 400
>
> Even our ESS IO server sometimes ends up being ejected (case in point -
> yesterday morning):
>
> Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA rdma send error
> IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum
> 0 vendor_err 135
> Jan 10 11:23:42 gssio2 mmfs: [E] VERBS RDMA closed connection to
> 172.41.2.1 (gssio1-fdr) on mlx5_1 port 1 fabnum 0 due to send error
> IBV_WC_RNR_RETRY_EXC_ERR index 3001
> Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error
> IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum
> 0 vendor_err 135
> Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to
> 172.41.2.1 (gssio1-fdr) on mlx5_1 port 2 fabnum 0 due to send error
> IBV_WC_RNR_RETRY_EXC_ERR index 2671
> Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA rdma send error
> IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum
> 0 vendor_err 135
> Jan 10 11:23:43 gssio2 mmfs: [E] VERBS RDMA closed connection to
> 172.41.2.1 (gssio1-fdr) on mlx5_0 port 2 fabnum 0 due to send error
> IBV_WC_RNR_RETRY_EXC_ERR index 2495
> Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA rdma send error
> IBV_WC_RNR_RETRY_EXC_ERR to 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum
> 0 vendor_err 135
> Jan 10 11:23:44 gssio2 mmfs: [E] VERBS RDMA closed connection to
> 172.41.2.1 (gssio1-fdr) on mlx5_0 port 1 fabnum 0 due to send error
> IBV_WC_RNR_RETRY_EXC_ERR index 3077
> Jan 10 11:24:11 gssio2 mmfs: [N] Node 172.41.2.1 (gssio1-fdr) lease
> renewal is overdue. Pinging to check if it is alive
>
> I've had multiple PMRs open for this issue, and I am told that our ESS
> needs code level upgrades in order to fix this issue. Looking at the
> errors, I think the issue is Infiniband related, and I am wondering if
> anyone on this list has seen similar issues?
>
> Thanks for your help in advance.
>
> Damir_______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> Ellei edellä ole toisin mainittu: / Unless stated otherwise above:
> Oy IBM Finland Ab
> PL 265, 00101 Helsinki, Finland
> Business ID, Y-tunnus: 0195876-3
> Registered in Finland
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170111/dc7f5143/attachment.htm>
More information about the gpfsug-discuss
mailing list