[gpfsug-discuss] nsd not adding with one quorum node down?

McPheeters, Gordon gmcpheeters at anl.gov
Thu Jan 5 23:34:04 GMT 2017


You might want to check the gpfs logs on the node cl003.  Often the message "Lost connection to file system daemon.” means that the daemon asserted while it was doing something... hence the lost connection.
If you are checking the state and seeing it in arbitrating mode immed after the command fails that also makes sense as it’s now re-joining the cluster.
If you aren’t watching carefully you can miss these events due to way mmfsd will resume the old mounts, hence you check the node with ‘df’ and see the file system is still mounted, then assume all is well, but in fact mmfsd has died and restarted.


Gordon McPheeters
ALCF Storage
(630) 252-6430
gmcpheeters at anl.gov<mailto:gmcpheeters at anl.gov>



On Jan 5, 2017, at 3:38 PM, Valdis.Kletnieks at vt.edu<mailto:Valdis.Kletnieks at vt.edu> wrote:

On Thu, 05 Jan 2017 20:44:33 +0000, Bryan Banister said:

Looking at this further, the output says the “The following disks of home
will be formatted on node cl003:“ however that node is the node in
‘arbitrating’ state, so I don’t see how that would work,

The bigger question:  If it was in "arbitrating", why was it selected as
the node to do the formatting?
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org<http://spectrumscale.org>
http://gpfsug.org/mailman/listinfo/gpfsug-discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170105/c6d496d3/attachment.htm>


More information about the gpfsug-discuss mailing list