[gpfsug-discuss] mmchdisk hung / proceeding at a glacial pace?

Buterbaugh, Kevin L Kevin.Buterbaugh at Vanderbilt.Edu
Sun Jul 15 18:24:43 BST 2018


Hi All,

We are in a partial cluster downtime today to do firmware upgrades on our storage arrays.  It is a partial downtime because we have two GPFS filesystems:

1.  gpfs23 - 900+ TB and which corresponds to /scratch and /data, and which I’ve unmounted across the cluster because it has data replication set to 1.

2.  gpfs22 - 42 TB and which corresponds to /home.  It has data replication set to two, so what we’re doing is “mmchdisk gpfs22 suspend -d <the gpfs22 NSD>”, then doing the firmware upgrade, and once the array is back we’re doing a “mmchdisk gpfs22 resume -d <NSD>”, followed by “mmchdisk gpfs22 start -d <NSD>”.

On the 1st storage array this went very smoothly … the mmchdisk took about 5 minutes, which is what I would expect.

But on the 2nd storage array the mmchdisk appears to either be hung or proceeding at a glacial pace.  For more than an hour it’s been stuck at:

mmchdisk: Processing continues ...
Scanning file system metadata, phase 1 …

There are no waiters of any significance and “mmdiag —iohist” doesn’t show any issues either.

Any ideas, anyone?  Unless I can figure this out I’m hosed for this downtime, as I’ve got 7 more arrays to do after this one!

Thanks!

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180715/09d9d956/attachment.htm>


More information about the gpfsug-discuss mailing list