[gpfsug-discuss] waiting for conn rdmas < conn maxrdmas

Sven Oehme oehmes at gmail.com
Thu Feb 23 17:12:40 GMT 2017


all this waiter shows is that you have more in flight than the node or
connection can currently serve. the reasons for that can be
misconfiguration or you simply run out of resources on the node, not the
connection. with latest code you shouldn't see this anymore for node limits
as the system automatically adjusts the number of maximum RDMA's according
to the systems Node capabilities :

you should see messages in your mmfslog like :

2017-02-23_06:19:50.056-0800: [I] VERBS RDMA starting with verbsRdmaCm=no
verbsRdmaSend=yes verbsRdmaUseMultiCqThreads=yes verbsRdmaUseCompVectors=yes
2017-02-23_06:19:50.078-0800: [I] VERBS RDMA library libibverbs.so (version
>= 1.1) loaded and initialized.
2017-02-23_06:19:50.078-0800: [I] VERBS RDMA verbsRdmasPerNode increased
from* 3072 to 3740 because verbsRdmasPerNodeOptimize is set to yes.*
2017-02-23_06:19:50.121-0800: [I] VERBS RDMA discover mlx5_5 port 1
transport IB link  IB NUMA node 16 pkey[0] 0xFFFF gid[0] subnet
0xFEC0000000000013 id 0xE41D2D0300FDB9CD state ACTIVE
2017-02-23_06:19:50.137-0800: [I] VERBS RDMA discover mlx5_4 port 1
transport IB link  IB NUMA node 16 pkey[0] 0xFFFF gid[0] subnet
0xFEC0000000000015 id 0xE41D2D0300FDB9CC state ACTIVE
2017-02-23_06:19:50.153-0800: [I] VERBS RDMA discover mlx5_3 port 1
transport IB link  IB NUMA node  1 pkey[0] 0xFFFF gid[0] subnet
0xFEC0000000000013 id 0xE41D2D0300FDB751 state ACTIVE
2017-02-23_06:19:50.169-0800: [I] VERBS RDMA discover mlx5_2 port 1
transport IB link  IB NUMA node  1 pkey[0] 0xFFFF gid[0] subnet
0xFEC0000000000015 id 0xE41D2D0300FDB750 state ACTIVE
2017-02-23_06:19:50.185-0800: [I] VERBS RDMA discover mlx5_1 port 1
transport IB link  IB NUMA node  0 pkey[0] 0xFFFF gid[0] subnet
0xFEC0000000000013 id 0xE41D2D0300FDB78D state ACTIVE
2017-02-23_06:19:50.201-0800: [I] VERBS RDMA discover mlx5_0 port 1
transport IB link  IB NUMA node  0 pkey[0] 0xFFFF gid[0] subnet
0xFEC0000000000015 id 0xE41D2D0300FDB78C state ACTIVE

we want to eliminate all this configurable limits eventually, but this
takes time, but as you can see above, we make progress on each release  :-)

Sven




On Thu, Feb 23, 2017 at 9:05 AM Aaron Knister <aaron.s.knister at nasa.gov>
wrote:

> On a particularly heavy loaded NSD server I'm seeing a lot of these
> messages:
>
> 0x7FFFF08B63E0 (  15539) waiting 0.004139456 seconds, NSDThread: on
> ThCond 0x7FFFA80772C8 (0x7FFFA80772C8) (VERBSEventWaitCondvar), reason
> 'waiting for conn rdmas < conn maxrdmas'
> 0x7FFFF08EED80 (  15584) waiting 0.004075718 seconds, NSDThread: on
> ThCond 0x7FFF680008F8 (0x7FFF680008F8) (VERBSEventWaitCondvar), reason
> 'waiting for conn rdmas < conn maxrdmas'
> 0x7FFFF08FDF00 (  15596) waiting 0.003965504 seconds, NSDThread: on
> ThCond 0x7FFF8C00E288 (0x7FFF8C00E288) (VERBSEventWaitCondvar), reason
> 'waiting for conn rdmas < conn maxrdmas'
> 0x7FFFF09185A0 (  15617) waiting 0.003916346 seconds, NSDThread: on
> ThCond 0x7FFF9000CB18 (0x7FFF9000CB18) (VERBSEventWaitCondvar), reason
> 'waiting for conn rdmas < conn maxrdmas'
> 0x7FFFF092B380 (  15632) waiting 0.003659610 seconds, NSDThread: on
> ThCond 0x1DB04B8 (0x1DB04B8) (VERBSEventWaitCondvar), reason 'waiting
> for conn rdmas < conn maxrdmas'
>
> I've tried tweaking verbsRdmasPerConnection but the issue seems to
> persist. Has anyone has encountered this and if so how'd you fix it?
>
> -Aaron
>
> --
> Aaron Knister
> NASA Center for Climate Simulation (Code 606.2)
> Goddard Space Flight Center
> (301) 286-2776
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170223/d1c98947/attachment.htm>


More information about the gpfsug-discuss mailing list