[gpfsug-discuss] CCR troubles

Marc A Kaplan makaplan at us.ibm.com
Wed Jul 27 19:03:05 BST 2016


I understand you are having problems with your cluster, but you do NOT 
need to have GPFS "started" to 
display and/or change configuration paramters.  You do need at least a 
majority of the nodes to be up and in communcation (e.g. can talk to each 
other by tcp/ip)

--ccr-enable
Enables the configuration server repository (CCR), which stores redundant 
copies of configuration data files on all quorum nodes. The advantage of 
CCR over the traditional primary or backup configuration server semantics 
is that when using CCR, all GPFS administration commands as well as file 
system mounts and daemon startups work normally as long as a majority of 
quorum nodes are accessible.

Think about how this must work (I have the advantage of actually NOT 
knowing the details, but one can reason...)
to maintain a consistent single configuration database, a majority of 
quorum nodes MUST agree on every bit of data in the configuration 
database.

Even to query the database and get a correct answer, you'd have to know 
that a majority agree on the answer.

(You could ask 1 guy, but then how would you know if he was telling you 
what the majority opinion is? The minority need not lie to mislead you, 
I don't think CCR guards against Byzantine failures...
The minority guy could just be out of touch for a while...)

I advise that you do some testing on a test cluster (could be virtual)... 



From:   Bryan Banister <bbanister at jumptrading.com>
To:     "gpfsug main discussion list (gpfsug-discuss at spectrumscale.org)" 
<gpfsug-discuss at spectrumscale.org>
Date:   07/27/2016 01:37 PM
Subject:        [gpfsug-discuss] CCR troubles
Sent by:        gpfsug-discuss-bounces at spectrumscale.org



When I have the GPFS cluster down, some GPFS commands no longer work like 
they should, or at least they did work without CCR:
 
# mmgetstate -aL       # Which stalls for a really stupid amount of time 
and then spits out:
get file failed: Not enough CCR quorum nodes available (err 809)
gpfsClusterInit: Unexpected error from ccr fget mmsdrfs.  Return code: 158
mmgetstate: Command failed. Examine previous error messages to determine 
cause.
 
And trying to change tuning parameters now also barfs when GPFS is down:
# [root at fpia-gpfs-jcsdr01 ~]# mmlsconfig
get file failed: Not enough CCR quorum nodes available (err 809)
gpfsClusterInit: Unexpected error from ccr fget mmsdrfs.  Return code: 158
mmlsconfig: Command failed. Examine previous error messages to determine 
cause.
 
# mmchconfig worker1Threads=128,prefetchThreads=128
mmchconfig: Unable to obtain the GPFS configuration file lock.
mmchconfig: GPFS was unable to obtain a lock from node 
fpia-gpfs-jcsdr01.grid.jumptrading.com.
mmchconfig: Command failed. Examine previous error messages to determine 
cause.
 
Which means I will have to start GPFS, change the parameter, shut GPFS 
down again, and start GPFS up again just to get the new setting.
 
Is this really the new mode of operation for CCR enabled clusters?
 
I searched CCR in the Concepts, Planning, and Install Guide and also the 
Adv. Admin Guide, with explanation. 
 
If so, then maybe I’ll go back to non CCR,
-Bryan


Note: This email is for the confidential use of the named addressee(s) 
only and may contain proprietary, confidential or privileged information. 
If you are not the intended recipient, you are hereby notified that any 
review, dissemination or copying of this email is strictly prohibited, and 
to please notify the sender immediately and destroy this email and any 
attachments. Email transmission cannot be guaranteed to be secure or 
error-free. The Company, therefore, does not make any guarantees as to the 
completeness or accuracy of this email or any attachments. This email is 
for informational purposes only and does not constitute a recommendation, 
offer, request or solicitation of any kind to buy, sell, subscribe, redeem 
or perform any type of transaction of a financial product.
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20160727/0f3637f8/attachment-0002.htm>


More information about the gpfsug-discuss mailing list