[gpfsug-discuss] connected v. datagram mode

Stijn De Weirdt stijn.deweirdt at ugent.be
Sun May 14 10:16:12 BST 2017


hi all,

does anyone know about the impact of memory usage? afaik, connected mode
keeps buffers for each QP (old-ish mellanox (connectx-2, MLNX ofed2)
instructions suggested not to use CM for large-ish (>128 nodes at that
time) clusters.

we never turned it back on, and now have 700 nodes.

wrt IPoIB MTU, UD can have up to 4042 (or something like that) with
correct opensm configuration.


stijn

On 05/13/2017 01:27 AM, Laurence Horrocks-Barlow wrote:
> It also depends on the adapter.
> 
> We have seen better performance using datagram with MLNX adapters
> however we see better in connected mode when using Intel truescale.
> Again as Jonathon has mentioned we have also seen better performance
> when using connected mode on active/slave bonded interface (even between
> a mixed MLNX/TS fabric).
> 
> There is also a significant difference in the MTU size you can use in
> datagram vs connected mode, with datagram being limited to 2044 (if
> memory serves) there as connected mode can use 65536 (again if memory
> serves).
> 
> I typically now run qperf and nsdperf benchmarks to find the best
> configuration.
> 
> -- Lauz
> 
> On 12/05/2017 16:05, Jonathon A Anderson wrote:
>> It may be true that you should always favor connected mode; but those
>> instructions look like they’re specifically only talking about when
>> you have bonded interfaces.
>>
>> ~jonathon
>>
>>
>> On 5/12/17, 9:03 AM, "gpfsug-discuss-bounces at spectrumscale.org on
>> behalf of Jan-Frode Myklebust"
>> <gpfsug-discuss-bounces at spectrumscale.org on behalf of
>> janfrode at tanso.net> wrote:
>>
>>                     I also don't know much about this, but the ESS
>> quick deployment guide is quite clear on the we should use connected
>> mode for IPoIB:
>>           --------------
>>      Note: If using bonded IP over IB, do the following: Ensure that
>> the CONNECTED_MODE=yes statement exists in the corresponding
>> slave-bond interface scripts located in /etc/sysconfig/network-scripts
>> directory of the management server and I/O server nodes. These
>>       scripts are created as part of the IP over IB bond creation. An
>> example of the slave-bond interface with the modification is shown below.
>>      ---------------
>>                -jf
>>      fre. 12. mai 2017 kl. 16.48 skrev Aaron Knister
>> <aaron.s.knister at nasa.gov>:
>>                For what it's worth we've seen *significantly* better
>> performance of
>>      streaming benchmarks of IPoIB with connected mode vs datagram
>> mode on IB.
>>           -Aaron
>>           On 5/12/17 10:43 AM, Jonathon A Anderson wrote:
>>      > This won’t tell you which to use; but datagram mode and
>> connected mode in IB is roughly analogous to UDB vs TCP in IP. One is
>> “unreliable” in that there’s no checking/retry built into the
>> protocol; the other is “reliable” and detects whether data is received
>>       completely and in the correct order.
>>      >
>>      > The last advice I heard for traditional IB was that the
>> overhead of connected mode isn’t worth it, particularly if you’re
>> using IPoIB (where you’re likely to be using TCP anyway). That said,
>> on our OPA network we’re seeing the opposite advice; so I, to, am
>>       often unsure what the most correct configuration would be for
>> any given fabric.
>>      >
>>      > ~jonathon
>>      >
>>      >
>>      > On 5/12/17, 4:42 AM, "gpfsug-discuss-bounces at spectrumscale.org
>> on behalf of Damir Krstic" <gpfsug-discuss-bounces at spectrumscale.org
>>       on behalf of damir.krstic at gmail.com> wrote:
>>      >
>>      >     I never fully understood the difference between connected
>> v. datagram mode beside the obvious packet size difference. Our NSD
>> servers (ESS GL6 nodes) are installed with RedHat 7 and are in
>> connected mode. Our 700+ clients are running RH6 and
>>      >      are in datagram mode.
>>      >
>>      >
>>      >     In a month we are upgrading our cluster to RedHat 7 and are
>> debating whether to leave the compute nodes in datagram mode or
>> whether to switch them to connected mode.
>>      >     What is is the right thing to do?
>>      >
>>      >
>>      >     Thanks in advance.
>>      >     Damir
>>      >
>>      >
>>      >
>>      > _______________________________________________
>>      > gpfsug-discuss mailing list
>>      > gpfsug-discuss at
>>      spectrumscale.org <http://spectrumscale.org>
>>      >
>>      http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>> <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>>      >
>>           --
>>      Aaron Knister
>>      NASA Center for Climate Simulation (Code 606.2)
>>      Goddard Space Flight Center
>>      (301) 286-2776
>>      _______________________________________________
>>      gpfsug-discuss mailing list
>>      gpfsug-discuss at
>>      spectrumscale.org <http://spectrumscale.org>
>>      http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>>                    
>> _______________________________________________
>> gpfsug-discuss mailing list
>> gpfsug-discuss at spectrumscale.org
>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> 
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss



More information about the gpfsug-discuss mailing list