[gpfsug-discuss] gpfs client cluster, lost quorum, ccr issues

Renata Maria Dart renata at slac.stanford.edu
Wed Jun 27 19:09:40 BST 2018


Hi, we have a client cluster of 4 nodes with 3 quorum nodes.  One of the
quorum nodes is no longer in service and the other was reinstalled with
a newer OS, both without informing the gpfs admins.  Gpfs is still
"working" on the two remaining nodes, that is, they continue to have access
to the gpfs data on the remote clusters.  But, I can no longer get
any gpfs commands to work.  On one of the 2 nodes that are still serving data,

root at ocio-gpu01 ~]# mmlscluster
get file failed: Not enough CCR quorum nodes available (err 809)
gpfsClusterInit: Unexpected error from ccr fget mmsdrfs.  Return code: 158
mmlscluster: Command failed. Examine previous error messages to determine cause.


On the reinstalled node, this fails in the same way:

[root at ocio-gpu02 ccr]# mmstartup
get file failed: Not enough CCR quorum nodes available (err 809)
gpfsClusterInit: Unexpected error from ccr fget mmsdrfs.  Return code: 158
mmstartup: Command failed. Examine previous error messages to determine cause.


I have looked through the users group interchanges but didn't find anything
that seems to fit this scenario.

Is there a way to salvage this cluster?  Can it be done without
shutting gpfs down on the 2 nodes that continue to work?

Thanks for any advice,

Renata Dart
SLAC National Accelerator Lb




More information about the gpfsug-discuss mailing list