[gpfsug-discuss] Infiniband: device mlx4_0 not found

Simon Thompson (IT Research Support) S.J.Thompson at bham.ac.uk
Sun Jun 18 18:53:28 BST 2017


There used to be issues with the CX-3 cards and specific ports for if you wanted to use IB and Eth, but that went away in later firmwares, as did a whole load of bits with it being slow to detect media type, so see if you are running an up to date Mellanox firmware (assuming it's a VPI card).

On CX-4 there is no auto detect media, but default is IB unless you changed it.

Simon 
________________________________________
From: gpfsug-discuss-bounces at spectrumscale.org [gpfsug-discuss-bounces at spectrumscale.org] on behalf of jcatana at gmail.com [jcatana at gmail.com]
Sent: 18 June 2017 16:30
To: gpfsug main discussion list
Subject: ?spam? Re: [gpfsug-discuss] Infiniband: device mlx4_0 not found

Are any cards VPI that can do both eth and ib? I remember reading in documentation that that there is a bus order to having mixed media with mellanox cards. There is a module setting during init where you can set eth ib or auto detect. If the card is on auto it might be coming up eth and making the driver flake out because it's in the wrong order.
Responding from my phone so I can't really look it up myself right now about what the proper order is, but maybe this might be some help troubleshooting.

On Jun 18, 2017 12:58 AM, "Frank Tower" <frank.tower at outlook.com<mailto:frank.tower at outlook.com>> wrote:

Hi,


You were right, ibv_devinfo -v doesn't return something if both card are connected. I didn't checked ibv_* tools, I supposed once IP stack and ibstat OK, the rest should work. I'm stupid 😊


Anyway, once I disconnect one card, ibv_devinfo show me input but with both cards, I don't have any input except "device not found".

And what is weird here, it's that it work only when one card are connected, no matter the card (both are similar: model, firmware, revision, company)... Really strange, I will dig more about the issue.


Stupid and bad workaround: connected a dual port Infiniband. But production system doesn't wait..


Thank for your help,
Frank

________________________________
From: Aaron Knister <aaron.knister at gmail.com<mailto:aaron.knister at gmail.com>>
Sent: Saturday, June 10, 2017 2:05 PM
To: gpfsug main discussion list
Subject: Re: [gpfsug-discuss] Infiniband: device mlx4_0 not found

Out of curiosity could you send us the output of "ibv_devinfo -v"?

-Aaron

Sent from my iPhone

On Jun 10, 2017, at 06:55, Frank Tower <frank.tower at outlook.com<mailto:frank.tower at outlook.com>> wrote:


Hi everybody,


I don't get why one of our compute node cannot start GPFS over IB.


I have the following error:


[I] VERBS RDMA starting with verbsRdmaCm=no verbsRdmaSend=no verbsRdmaUseMultiCqThreads=yes verbsRdmaUseCompVectors=yes

[I] VERBS RDMA library libibverbs.so (version >= 1.1) loaded and initialized.

[I] VERBS RDMA verbsRdmasPerNode reduced from 1000 to 514 to match (nsdMaxWorkerThreads 512 + (nspdThreadsPerQueue 2 * nspdQueues 1)).

[I] VERBS RDMA parse verbsPorts mlx4_0/1

[W] VERBS RDMA parse error   verbsPort mlx4_0/1   ignored due to device mlx4_0 not found

[I] VERBS RDMA library libibverbs.so unloaded.

[E] VERBS RDMA failed to start, no valid verbsPorts defined.



I'm using Centos 7.3, Kernel 3.10.0-514.21.1.el7.x86_64.


I have 2 infinibands card, both have an IP and working well.


[root at rdx110 ~]# ibstat -l

mlx4_0

mlx4_1

[root at rdx110 ~]#


I tried configuration with both card, and no one work with GPFS.


I also tried with mlx4_0/1, but same problem.


Someone already have the issue ?


Kind Regards,

Frank




_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



More information about the gpfsug-discuss mailing list