[gpfsug-discuss] CCR cluster down for the count?
Oesterlin, Robert
Robert.Oesterlin at nuance.com
Wed Sep 20 00:39:37 BST 2017
OK – I’ve run across this before, and it’s because of a bug (as I recall) having to do with CCR and quorum. What I think you can do is set the cluster to non-ccr (mmchcluster –ccr-disable) with all the nodes down, bring it back up and then re-enable ccr.
I’ll see if I can find this in one of the recent 4.2 release nodes.
Bob Oesterlin
Sr Principal Storage Engineer, Nuance
From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Tuesday, September 19, 2017 at 4:03 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count?
Hi All,
We have a small test cluster that is CCR enabled. It only had/has 3 NSD servers (testnsd1, 2, and 3) and maybe 3-6 clients. testnsd3 died a while back. I did nothing about it at the time because it was due to be life-cycled as soon as I finished a couple of higher priority projects.
Yesterday, testnsd1 also died, which took the whole cluster down. So now resolving this has become higher priority… ;-)
I took two other boxes and set them up as testnsd1 and 3, respectively. I’ve done a “mmsdrrestore -p testnsd2 -R /usr/bin/scp” on both of them. I’ve also done a "mmccr setup -F” and copied the ccr.disks and ccr.nodes files from testnsd2 to them. And I’ve copied /var/mmfs/gen/mmsdrfs from testnsd2 to testnsd1 and 3. In case it’s not obvious from the above, networking is fine … ssh without a password between those 3 boxes is fine.
However, when I try to startup GPFS … or run any GPFS command I get:
/root
root at testnsd2# mmstartup -a
get file failed: Not enough CCR quorum nodes available (err 809)
gpfsClusterInit: Unexpected error from ccr fget mmsdrfs. Return code: 158
mmstartup: Command failed. Examine previous error messages to determine cause.
/root
root at testnsd2#
I’ve got to run to a meeting right now, so I hope I’m not leaving out any crucial details here … does anyone have an idea what I need to do? Thanks…
—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and Education
Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> - (615)875-9633
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170919/f4a15d88/attachment.htm>
More information about the gpfsug-discuss
mailing list