[gpfsug-discuss] waiting for conn rdmas < conn maxrdmas
Aaron Knister
aaron.s.knister at nasa.gov
Fri Feb 24 19:31:08 GMT 2017
Interesting, thanks Sven!
Could "resources" I'm running out of include NSD server queues?
On 2/23/17 12:12 PM, Sven Oehme wrote:
> all this waiter shows is that you have more in flight than the node or
> connection can currently serve. the reasons for that can be
> misconfiguration or you simply run out of resources on the node, not the
> connection. with latest code you shouldn't see this anymore for node
> limits as the system automatically adjusts the number of maximum RDMA's
> according to the systems Node capabilities :
>
> you should see messages in your mmfslog like :
>
> 2017-02-23_06:19:50.056-0800: [I] VERBS RDMA starting with
> verbsRdmaCm=no verbsRdmaSend=yes verbsRdmaUseMultiCqThreads=yes
> verbsRdmaUseCompVectors=yes
> 2017-02-23_06:19:50.078-0800: [I] VERBS RDMA library libibverbs.so
> (version >= 1.1) loaded and initialized.
> 2017-02-23_06:19:50.078-0800: [I] VERBS RDMA verbsRdmasPerNode increased
> from*_3072 to 3740 because verbsRdmasPerNodeOptimize is set to yes._*
> 2017-02-23_06:19:50.121-0800: [I] VERBS RDMA discover mlx5_5 port 1
> transport IB link IB NUMA node 16 pkey[0] 0xFFFF gid[0] subnet
> 0xFEC0000000000013 id 0xE41D2D0300FDB9CD state ACTIVE
> 2017-02-23_06:19:50.137-0800: [I] VERBS RDMA discover mlx5_4 port 1
> transport IB link IB NUMA node 16 pkey[0] 0xFFFF gid[0] subnet
> 0xFEC0000000000015 id 0xE41D2D0300FDB9CC state ACTIVE
> 2017-02-23_06:19:50.153-0800: [I] VERBS RDMA discover mlx5_3 port 1
> transport IB link IB NUMA node 1 pkey[0] 0xFFFF gid[0] subnet
> 0xFEC0000000000013 id 0xE41D2D0300FDB751 state ACTIVE
> 2017-02-23_06:19:50.169-0800: [I] VERBS RDMA discover mlx5_2 port 1
> transport IB link IB NUMA node 1 pkey[0] 0xFFFF gid[0] subnet
> 0xFEC0000000000015 id 0xE41D2D0300FDB750 state ACTIVE
> 2017-02-23_06:19:50.185-0800: [I] VERBS RDMA discover mlx5_1 port 1
> transport IB link IB NUMA node 0 pkey[0] 0xFFFF gid[0] subnet
> 0xFEC0000000000013 id 0xE41D2D0300FDB78D state ACTIVE
> 2017-02-23_06:19:50.201-0800: [I] VERBS RDMA discover mlx5_0 port 1
> transport IB link IB NUMA node 0 pkey[0] 0xFFFF gid[0] subnet
> 0xFEC0000000000015 id 0xE41D2D0300FDB78C state ACTIVE
>
> we want to eliminate all this configurable limits eventually, but this
> takes time, but as you can see above, we make progress on each release :-)
>
> Sven
>
>
>
>
> On Thu, Feb 23, 2017 at 9:05 AM Aaron Knister <aaron.s.knister at nasa.gov
> <mailto:aaron.s.knister at nasa.gov>> wrote:
>
> On a particularly heavy loaded NSD server I'm seeing a lot of these
> messages:
>
> 0x7FFFF08B63E0 ( 15539) waiting 0.004139456 seconds, NSDThread: on
> ThCond 0x7FFFA80772C8 (0x7FFFA80772C8) (VERBSEventWaitCondvar), reason
> 'waiting for conn rdmas < conn maxrdmas'
> 0x7FFFF08EED80 ( 15584) waiting 0.004075718 seconds, NSDThread: on
> ThCond 0x7FFF680008F8 (0x7FFF680008F8) (VERBSEventWaitCondvar), reason
> 'waiting for conn rdmas < conn maxrdmas'
> 0x7FFFF08FDF00 ( 15596) waiting 0.003965504 seconds, NSDThread: on
> ThCond 0x7FFF8C00E288 (0x7FFF8C00E288) (VERBSEventWaitCondvar), reason
> 'waiting for conn rdmas < conn maxrdmas'
> 0x7FFFF09185A0 ( 15617) waiting 0.003916346 seconds, NSDThread: on
> ThCond 0x7FFF9000CB18 (0x7FFF9000CB18) (VERBSEventWaitCondvar), reason
> 'waiting for conn rdmas < conn maxrdmas'
> 0x7FFFF092B380 ( 15632) waiting 0.003659610 seconds, NSDThread: on
> ThCond 0x1DB04B8 (0x1DB04B8) (VERBSEventWaitCondvar), reason 'waiting
> for conn rdmas < conn maxrdmas'
>
> I've tried tweaking verbsRdmasPerConnection but the issue seems to
> persist. Has anyone has encountered this and if so how'd you fix it?
>
> -Aaron
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776 <tel:(301)%20286-2776>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
More information about the gpfsug-discuss
mailing list