[gpfsug-discuss] mmchdisk hung / proceeding at a glacial pace?
Buterbaugh, Kevin L
Kevin.Buterbaugh at Vanderbilt.Edu
Sun Jul 15 18:24:43 BST 2018
Hi All,
We are in a partial cluster downtime today to do firmware upgrades on our storage arrays. It is a partial downtime because we have two GPFS filesystems:
1. gpfs23 - 900+ TB and which corresponds to /scratch and /data, and which I’ve unmounted across the cluster because it has data replication set to 1.
2. gpfs22 - 42 TB and which corresponds to /home. It has data replication set to two, so what we’re doing is “mmchdisk gpfs22 suspend -d <the gpfs22 NSD>”, then doing the firmware upgrade, and once the array is back we’re doing a “mmchdisk gpfs22 resume -d <NSD>”, followed by “mmchdisk gpfs22 start -d <NSD>”.
On the 1st storage array this went very smoothly … the mmchdisk took about 5 minutes, which is what I would expect.
But on the 2nd storage array the mmchdisk appears to either be hung or proceeding at a glacial pace. For more than an hour it’s been stuck at:
mmchdisk: Processing continues ...
Scanning file system metadata, phase 1 …
There are no waiters of any significance and “mmdiag —iohist” doesn’t show any issues either.
Any ideas, anyone? Unless I can figure this out I’m hosed for this downtime, as I’ve got 7 more arrays to do after this one!
Thanks!
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180715/09d9d956/attachment.htm>
More information about the gpfsug-discuss
mailing list