[gpfsug-discuss] GPFS heartbeat network specifications and resilience

Brian Marshall mimarsh2 at vt.edu
Fri Jul 22 15:39:55 BST 2016


Sort of trailing on this thread - Is a bonded active-active 10gig ethernet
network enough bandwidth to run data and heartbeat/admin on the same
network?  I assume it comes down to a question of latency and congestion
but would like to hear others' stories.

Is anyone doing anything fancy with QOS to make sure admin/heartbeat
traffic is not delayed?

All of our current clusters use Infiniband for data and mgt traffic, but we
are building a cluster that has dual 10gigE to each compute node. The NSD
servers have 40gigE connections to the core network where 10gigE switches
uplink.

On Fri, Jul 22, 2016 at 4:57 AM, Ashish Thandavan <
ashish.thandavan at cs.ox.ac.uk> wrote:

> Hi Richard,
>
> Thank you, that is very good to know!
>
> Regards,
> Ash
>
>
> On 22/07/16 09:36, Sobey, Richard A wrote:
>
>> Hi Ash
>>
>> Our ifcfg files for the bonded interfaces (this applies to GPFS, data and
>> mgmt networks) are set to mode1:
>>
>> BONDING_OPTS="mode=1 miimon=200"
>>
>> If we have ever had a network outage on the ports for these interfaces,
>> apart from pulling a cable for testing when they went in, then I guess we
>> have it setup right as we've never noticed an issue. The specific mode1 was
>> asked for by our networks team.
>>
>> Richard
>>
>> -----Original Message-----
>> From: gpfsug-discuss-bounces at spectrumscale.org [mailto:
>> gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Ashish Thandavan
>> Sent: 21 July 2016 11:26
>> To: gpfsug-discuss at spectrumscale.org
>> Subject: [gpfsug-discuss] GPFS heartbeat network specifications and
>> resilience
>>
>> Dear all,
>>
>> Please could anyone be able to point me at specifications required for
>> the GPFS heartbeat network? Are there any figures for latency, jitter, etc
>> that one should be aware of?
>>
>> I also have a related question about resilience. Our three GPFS NSD
>> servers utilize a single network port on each server and communicate
>> heartbeat traffic over a private VLAN. We are looking at improving the
>> resilience of this setup by adding an additional network link on each
>> server (going to a different member of a pair of stacked switches than the
>> existing one) and running the heartbeat network over bonded interfaces on
>> the three servers. Are there any recommendations as to which network
>> bonding type to use?
>>
>> Based on the name alone, Mode 1 (active-backup) appears to be the ideal
>> choice, and I believe the switches do not need any special configuration.
>> However, it has been suggested that Mode 4 (802.3ad) or LACP bonding might
>> be the way to go; this aggregates the two ports and does require the
>> relevant switch ports to be configured to support this.
>> Is there a recommended bonding mode?
>>
>> If anyone here currently uses bonded interfaces for their GPFS heartbeat
>> traffic, may I ask what type of bond have you configured? Have you had any
>> problems with the setup? And more importantly, has it been of use in
>> keeping the cluster up and running in the scenario of one network link
>> going down?
>>
>> Thank you,
>>
>> Regards,
>> Ash
>>
>>
>>
>> --
>> -------------------------
>> Ashish Thandavan
>>
>> UNIX Support Computing Officer
>> Department of Computer Science
>> University of Oxford
>> Wolfson Building
>> Parks Road
>> Oxford OX1 3QD
>>
>> Phone: 01865 610733
>> Email: ashish.thandavan at cs.ox.ac.uk
>>
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>
>
> --
> -------------------------
> Ashish Thandavan
>
> UNIX Support Computing Officer
> Department of Computer Science
> University of Oxford
> Wolfson Building
> Parks Road
> Oxford OX1 3QD
>
> Phone: 01865 610733
> Email: ashish.thandavan at cs.ox.ac.uk
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160722/50cf5e66/attachment.htm>


More information about the gpfsug-discuss mailing list