[gpfsug-discuss] CCR cluster down for the count?

IBM Spectrum Scale scale at us.ibm.com
Wed Sep 20 04:07:18 BST 2017


Hi Kevin,

Let's me try to understand the problem you have. What's the meaning of node
died here. Are you mean that there are some hardware/OS issue which cannot
be fixed and OS cannot be up anymore?

I agree with Bob that you can have a try to disable CCR temporally, restore
cluster configuration and enable it again.

Such as:


1. Login to a node which has proper GPFS config, e.g NodeA
2. Shutdown daemon in all client cluster.
3. mmchcluster --ccr-disable -p NodeA
4. mmsdrrestore -a -p NodeA
5. mmauth genkey propagate -N testnsd1, testnsd3
6. mmchcluster --ccr-enable

Regards, The Spectrum Scale (GPFS) team

------------------------------------------------------------------------------------------------------------------

If you feel that your question can benefit other users of  Spectrum Scale
(GPFS), then please post it to the public IBM developerWroks Forum at
https://www.ibm.com/developerworks/community/forums/html/forum?id=11111111-0000-0000-0000-000000000479.


If your query concerns a potential software error in Spectrum Scale (GPFS)
and you have an IBM software maintenance contract please contact
1-800-237-5511 in the United States or your local IBM Service Center in
other countries.

The forum is informally monitored as time permits and should not be used
for priority messages to the Spectrum Scale (GPFS) team.



From:	"Oesterlin, Robert" <Robert.Oesterlin at nuance.com>
To:	gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date:	09/20/2017 07:39 AM
Subject:	Re: [gpfsug-discuss] CCR cluster down for the count?
Sent by:	gpfsug-discuss-bounces at spectrumscale.org



OK – I’ve run across this before, and it’s because of a bug (as I recall)
having to do with CCR and quorum. What I think you can do is set the
cluster to non-ccr (mmchcluster –ccr-disable) with all the nodes down,
bring it back up and then re-enable ccr.

I’ll see if I can find this in one of the recent 4.2 release nodes.


Bob Oesterlin
Sr Principal Storage Engineer, Nuance


From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "Buterbaugh,
Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Tuesday, September 19, 2017 at 4:03 PM
To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] [gpfsug-discuss] CCR cluster down for the count?

Hi All,

We have a small test cluster that is CCR enabled.  It only had/has 3 NSD
servers (testnsd1, 2, and 3) and maybe 3-6 clients.  testnsd3 died a while
back.  I did nothing about it at the time because it was due to be
life-cycled as soon as I finished a couple of higher priority projects.

Yesterday, testnsd1 also died, which took the whole cluster down.  So now
resolving this has become higher priority… ;-)

I took two other boxes and set them up as testnsd1 and 3, respectively.
I’ve done a “mmsdrrestore -p testnsd2 -R /usr/bin/scp” on both of them.
I’ve also done a "mmccr setup -F” and copied the ccr.disks and ccr.nodes
files from testnsd2 to them.  And I’ve copied /var/mmfs/gen/mmsdrfs from
testnsd2 to testnsd1 and 3.  In case it’s not obvious from the above,
networking is fine … ssh without a password between those 3 boxes is fine.

However, when I try to startup GPFS … or run any GPFS command I get:

/root
root at testnsd2# mmstartup -a
get file failed: Not enough CCR quorum nodes available (err 809)
gpfsClusterInit: Unexpected error from ccr fget mmsdrfs.  Return code: 158
mmstartup: Command failed. Examine previous error messages to determine
cause.
/root
root at testnsd2#

I’ve got to run to a meeting right now, so I hope I’m not leaving out any
crucial details here … does anyone have an idea what I need to do?  Thanks…

—
Kevin Buterbaugh - Senior System Administrator
Vanderbilt University - Advanced Computing Center for Research and
Education
Kevin.Buterbaugh at vanderbilt.edu - (615)875-9633


 _______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
https://urldefense.proofpoint.com/v2/url?u=http-3A__gpfsug.org_mailman_listinfo_gpfsug-2Ddiscuss&d=DwICAg&c=jf_iaSHvJObTbx-siA1ZOg&r=IbxtjdkPAM2Sbon4Lbbi4w&m=mBSa534LB4C2zN59ZsJSlginQqfcrutinpAPYNDqU_Y&s=YJEapknqzE2d9kwZzZuu6gEW0DzBoM-o94pXGEeCfuI&e=



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170920/6269f48c/attachment.htm>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: graycol.gif
Type: image/gif
Size: 105 bytes
Desc: not available
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170920/6269f48c/attachment.gif>


More information about the gpfsug-discuss mailing list