[gpfsug-discuss] data interface and management infercace.

Scott D sdenham at gmail.com
Mon Jul 13 17:45:48 BST 2015


I spent a good deal of time exploring this topic when I was at IBM. I think
there are two key aspects here; the congestion of the actual interfaces on
the [cluster, FS, token] management nodes and competition for other
resources like CPU cycles on those nodes.  When using a single Ethernet
interface (or for that matter IB RDMA + IPoIB over the same interface), at
some point the two kinds of traffic begin to conflict. The management
traffic being much more time sensitive suffers as a result.  One solution
is to separate the traffic.  For larger clusters though (1000s of nodes), a
better solution, that may avoid having to have a 2nd interface on every
client node, is to add dedicated nodes as managers and not rely on NSD
servers for this.  It does cost you some modest servers and GPFS server
licenses.  My previous client generally used previous-generation retired
compute nodes for this job.

Scott

Date: Mon, 13 Jul 2015 15:25:32 +0100
> From: Vic Cornell <viccornell at gmail.com>
> Subject: Re: [gpfsug-discuss] data interface and management infercace.
>
> Hi Salvatore,
>
> I agree that that is what the manual - and some of the wiki entries say.
>
> However , when we have had problems (typically congestion) with ethernet
> networks in the past (20GbE or 40GbE) we have resolved them by setting up a
> separate ?Admin? network.
>
> The before and after cluster health we have seen measured in number of
> expels and waiters has been very marked.
>
> Maybe someone ?in the know? could comment on this split.
>
> Regards,
>
> Vic
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20150713/22328df7/attachment.htm>


More information about the gpfsug-discuss mailing list