[gpfsug-discuss] Tuning AFM for high throughput/high IO over _really_ long distances

Jan-Frode Myklebust janfrode at tanso.net
Wed Nov 9 18:05:21 GMT 2016


Mostly curious, don't have experience in such environments, but ... Is this
AFM over NFS or NSD protocol? Might be interesting to try the other option
-- and also check how nsdperf performs over such distance/latency.



-jf
ons. 9. nov. 2016 kl. 18.39 skrev Jake Carroll <jake.carroll at uq.edu.au>:

> Hi.
>
>
>
> I’ve got an GPFS to GPFS AFM cache/home (IW) relationship set up over a
> really long distance. About 180ms of latency between the two clusters and
> around 13,000km of optical path. Fortunately for me, I’ve actually got near
> theoretical maximum IO over the NIC’s between the clusters and I’m
> iPerf’ing at around 8.90 to 9.2Gbit/sec over a 10GbE circuit. All MTU9000
> all the way through.
>
>
>
> Anyway – I’m finding my AFM traffic to be dragging its feet and I don’t
> really understand why that might be. I’ve verified the links and transports
> ability as I said above with iPerf, and CERN’s FDT to near 10Gbit/sec.
>
>
>
> I also verified the clusters on both sides in terms of disk IO and they
> both seem easily capable in IOZone and IOR tests of multiple GB/sec of
> throughput.
>
>
>
> So – my questions:
>
>
>
> 1.       Are there very specific tunings AFM needs for high latency/long
> distance IO?
>
> 2.       Are there very specific NIC/TCP-stack tunings (beyond the type
> of thing we already have in place) that benefits AFM over really long
> distances and high latency?
>
> 3.       We are seeing on the “cache” side really lazy/sticky “ls –als”
> in the home mount. It sometimes takes 20 to 30 seconds before the command
> line will report back with a long listing of files. Any ideas why it’d take
> that long to get a response from “home”.
>
>
>
> We’ve got our TCP stack setup fairly aggressively, on all hosts that
> participate in these two clusters.
>
>
>
> ethtool -C enp2s0f0 adaptive-rx off
>
> ifconfig enp2s0f0 txqueuelen 10000
>
> sysctl -w net.core.rmem_max=536870912
>
> sysctl -w net.core.wmem_max=536870912
>
> sysctl -w net.ipv4.tcp_rmem="4096 87380 268435456"
>
> sysctl -w net.ipv4.tcp_wmem="4096 65536 268435456"
>
> sysctl -w net.core.netdev_max_backlog=250000
>
> sysctl -w net.ipv4.tcp_congestion_control=htcp
>
> sysctl -w net.ipv4.tcp_mtu_probing=1
>
>
>
> I modified a couple of small things on the AFM “cache” side to see if it’d
> make a difference such as:
>
>
>
> mmchconfig afmNumWriteThreads=4
>
> mmchconfig afmNumReadThreads=4
>
>
>
> But no difference so far.
>
>
>
> Thoughts would be appreciated. I’ve done this before over much shorter
> distances (30Km) and I’ve flattened a 10GbE wire without really
> tuning…anything. Are my large in-flight-packets
> numbers/long-time-to-acknowledgement semantics going to hurt here? I really
> thought AFM might be well designed for exactly this kind of work at long
> distance **and** high throughput – so I must be missing something!
>
>
>
> -jc
>
>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20161109/f44369ab/attachment.htm>


More information about the gpfsug-discuss mailing list