[gpfsug-discuss] mmchdisk performance/behavior in a stretch cluster config?

Fri Nov 18 19:05:39 GMT 2016

So as a basis for our archive solution, we're using a GPFS cluster
in a stretch configuration, with 2 sites separated by about 20ms worth
of 10G link.  Each end has 2 protocol servers doing NFS and 3 NSD servers.
Identical disk arrays and LTFS/EE at both ends, and all metadata and
userdata are replicated to both sites.

We had a fiber issue for about 8 hours yesterday, and as expected (since there
are only 5 quorum nodes, 3 local and 2 at the far end) the far end fell off the
cluster and down'ed all the NSDs on the remote arrays.

There's about 123T of data at each end, 6 million files in there so far.

So after the fiber came back up after a several-hour downtime, I
did the 'mmchdisk archive start -a'.  That was at 17:45 yesterday.
I'm now 20 hours in, at:

  62.15 % complete on Fri Nov 18 13:52:59 2016  (   4768429 inodes with total  173675926 MB data processed)
  62.17 % complete on Fri Nov 18 13:53:20 2016  (   4769416 inodes with total  173710731 MB data processed)
  62.18 % complete on Fri Nov 18 13:53:40 2016  (   4772481 inodes with total  173762456 MB data processed)

network statistics indicate that the 3 local NSDs are all tossing out
packets at about 400Mbytes/second, which means the 10G pipe is pretty damned
close to totally packed full, and the 3 remotes are sending back ACKs
of all the data.

Rough back-of-envelop calculations indicate that (a) if I'm at 62% after
20 hours, it will take 30 hours to finish and (b) a 10G link takes about
29 hours at full blast to move 123T of data.  So it certainly *looks*
like it's resending everything.

And that's even though at least 100T of that 123T is test data that was
written by one of our users back on Nov 12/13, and thus theoretically *should*
already have been at the remote site.

Any ideas what's going on here?