[gpfsug-discuss] [Newsletter] Re: Problem with mmlscluster and callback scripts
Matthias Knigge
Matthias.Knigge at rohde-schwarz.com
Mon Sep 10 12:19:14 BST 2018
Hi Araon,
in my setup I have no chance to define a tiebreaker disk. So if one node goes down I would change the role if this node.
mmchnode --nonquorum -N nodename --force
After that I can start the filesystem and mount it.
Thanks,
Matthias
Best Regards
Matthias Knigge
R&D File Based Media Solutions
Rohde & Schwarz
GmbH & Co. KG
Hanomaghof 1
30449 Hannover
Telefon +49 511 67 80 7 213
Fax +49 511 37 19 74
Internet: Matthias.Knigge at rohde-schwarz.com
------------------------------------------------------------
Geschäftsführung / Executive Board: Christian Leicher (Vorsitzender / Chairman), Peter Riedel, Sitz der Gesellschaft / Company's Place of Business: München, Registereintrag / Commercial Register No.: HRA 16 270, Persönlich haftender Gesellschafter / Personally Liable Partner: RUSEG Verwaltungs-GmbH, Sitz der Gesellschaft / Company's Place of Business: München, Registereintrag / Commercial Register No.: HRB 7 534, Umsatzsteuer-Identifikationsnummer (USt-IdNr.) / VAT Identification No.: DE 130 256 683, Elektro-Altgeräte Register (EAR) / WEEE Register No.: DE 240 437 86
-----Original Message-----
From: gpfsug-discuss-bounces at spectrumscale.org <gpfsug-discuss-bounces at spectrumscale.org> On Behalf Of Aaron Knister
Sent: Friday, September 07, 2018 3:35 PM
To: gpfsug-discuss at spectrumscale.org
Subject: *EXT* [Newsletter] Re: [gpfsug-discuss] Problem with mmlscluster and callback scripts
Hi Matthias,
Looks like you lost quorum in the cluster (you've got to have (n/2+1) quorum nodes up if you're using node-based quorum). Do you have a tiebreaker disk defined? (i.e. mmlsconfig tiebreakerdisk).
-Aaron
On 9/7/18 7:51 AM, Matthias Knigge wrote:
> Hello together,
>
> I am using the version 5.0.2.0 of GPFS and have problems with the
> command mmlscluster and callback-scripts. It is a small cluster of two
> nodes only. If I shutdown one of the nodes sometimes mmlscluster
> reports the following output:
>
> [root at gpfs-tier1 gpfs5.2]# mmgetstate
>
> Node number Node name GPFS state
>
> -------------------------------------------
>
> 1 gpfs-tier1 arbitrating
>
> [root at gpfs-tier1 gpfs5.2]# mmlscluster
>
> ssh: connect to host gpfs-tier2 port 22: No route to host
>
> mmlscluster: Unable to retrieve GPFS cluster files from node
> gpfs-tier2
>
> mmlscluster: Command failed. Examine previous error messages to
> determine cause.
>
> Normally the output is like this:
>
> [root at gpfs-tier1 gpfs5.2]# mmlscluster
>
> GPFS cluster information
>
> ========================
>
> GPFS cluster name: TIERCLUSTER.gpfs-tier1
>
> GPFS cluster id: 12458173498278694815
>
> GPFS UID domain: TIERCLUSTER.gpfs-tier1
>
> Remote shell command: /usr/bin/ssh
>
> Remote file copy command: /usr/bin/scp
>
> Repository type: server-based
>
> GPFS cluster configuration servers:
>
> -----------------------------------
>
> Primary server: gpfs-tier2
>
> Secondary server: gpfs-tier1
>
> Node Daemon node name IP address Admin node name Designation
>
> ----------------------------------------------------------------------
>
> 1 gpfs-tier1 192.168.178.10 gpfs-tier1
> quorum-manager
>
> 2 gpfs-tier2 192.168.178.11 gpfs-tier2
> quorum-manager
>
> [root at gpfs-tier1 gpfs5.2]# mmlscallback
>
> NodeDownCallback
>
> command = /var/mmfs/rs/nodedown.ksh
>
> priority = 1
>
> event = quorumNodeLeave
>
> parms = %eventNode %quorumNodes
>
> NodeUpCallback
>
> command = /var/mmfs/rs/nodeup.ksh
>
> priority = 1
>
> event = quorumNodeJoin
>
> parms = %eventNode %quorumNodes
>
> If I shutdown the filesystem via mmshutdown the callback script works
> but if I shutdown the whole node the scripts does not run.
>
> The latest log-entry in mmfs.log.latest shows only this information:
>
> 2018-09-07_13:12:36.724+0200: [I] Cluster Manager connection broke.
> Probing cluster TIERCLUSTER.gpfs-tier1
>
> 2018-09-07_13:12:37.226+0200: [E] Unable to contact enough other
> quorum nodes during cluster probe.
>
> 2018-09-07_13:12:37.226+0200: [E] Lost membership in cluster
> TIERCLUSTER.gpfs-tier1. Unmounting file systems.
>
> 2018-09-07_13:12:38.448+0200: [N] Connecting to 192.168.178.11
> gpfs-tier2 <c0p1>
>
> Could anybody help me in this case? I want to try to start a script if
> one node goes down or up to change the roles for starting the
> filesystem. The callback event NodeLeave and NodeJoin do not run too.
>
> Any more information required? If yes, please let me know!
>
> Many thanks in advance and a nice weekend!
>
> Matthias
>
> Best Regards
>
> Matthias Knigge
> R&D File Based Media Solutions
>
> Rohde & Schwarz
> GmbH & Co. KG
> Hanomaghof 1
> 30449 Hannover
> Telefon +49 511 67 80 7 213
> Fax +49 511 37 19 74
> Internet: Matthias.Knigge at rohde-schwarz.com
> ------------------------------------------------------------
> Geschäftsführung / Executive Board: Christian Leicher (Vorsitzender /
> Chairman), Peter Riedel, Sitz der Gesellschaft / Company's Place of
> Business: München, Registereintrag / Commercial Register No.: HRA 16
> 270, Persönlich haftender Gesellschafter / Personally Liable Partner:
> RUSEG Verwaltungs-GmbH, Sitz der Gesellschaft / Company's Place of
> Business: München, Registereintrag / Commercial Register No.: HRB 7
> 534, Umsatzsteuer-Identifikationsnummer (USt-IdNr.) / VAT Identification No.:
> DE 130 256 683, Elektro-Altgeräte Register (EAR) / WEEE Register No.:
> DE
> 240 437 86
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2) Goddard Space Flight Center
(301) 286-2776
_______________________________________________
gpfsug-discuss mailing list
gpfsug-discuss at spectrumscale.org
http://gpfsug.org/mailman/listinfo/gpfsug-discuss
More information about the gpfsug-discuss
mailing list