[gpfsug-discuss] mmchdisk performance/behavior in a stretch cluster config?
Valdis Kletnieks
Valdis.Kletnieks at vt.edu
Fri Nov 18 19:05:39 GMT 2016
So as a basis for our archive solution, we're using a GPFS cluster
in a stretch configuration, with 2 sites separated by about 20ms worth
of 10G link. Each end has 2 protocol servers doing NFS and 3 NSD servers.
Identical disk arrays and LTFS/EE at both ends, and all metadata and
userdata are replicated to both sites.
We had a fiber issue for about 8 hours yesterday, and as expected (since there
are only 5 quorum nodes, 3 local and 2 at the far end) the far end fell off the
cluster and down'ed all the NSDs on the remote arrays.
There's about 123T of data at each end, 6 million files in there so far.
So after the fiber came back up after a several-hour downtime, I
did the 'mmchdisk archive start -a'. That was at 17:45 yesterday.
I'm now 20 hours in, at:
62.15 % complete on Fri Nov 18 13:52:59 2016 ( 4768429 inodes with total 173675926 MB data processed)
62.17 % complete on Fri Nov 18 13:53:20 2016 ( 4769416 inodes with total 173710731 MB data processed)
62.18 % complete on Fri Nov 18 13:53:40 2016 ( 4772481 inodes with total 173762456 MB data processed)
network statistics indicate that the 3 local NSDs are all tossing out
packets at about 400Mbytes/second, which means the 10G pipe is pretty damned
close to totally packed full, and the 3 remotes are sending back ACKs
of all the data.
Rough back-of-envelop calculations indicate that (a) if I'm at 62% after
20 hours, it will take 30 hours to finish and (b) a 10G link takes about
29 hours at full blast to move 123T of data. So it certainly *looks*
like it's resending everything.
And that's even though at least 100T of that 123T is test data that was
written by one of our users back on Nov 12/13, and thus theoretically *should*
already have been at the remote site.
Any ideas what's going on here?
More information about the gpfsug-discuss
mailing list