[gpfsug-discuss] proper gpfs shutdown when node disappears

Thu Feb 2 19:33:41 GMT 2017

You could forcibly expel the node (one of my favorite GPFS commands):

mmexpelnode -N $nodename

and then power it off after the expulsion is complete and then do

mmepelenode -r -N $nodename

which will allow it to join the cluster next time you try and start up 
GPFS on it. You'll still likely have to go through recovery but you'll 
skip the part where GPFS wonders where the node went prior to it 
expelling it.

-Aaron

On 2/2/17 2:28 PM, valdis.kletnieks at vt.edu wrote:
> On Thu, 02 Feb 2017 18:28:22 +0100, "Olaf Weiser" said:
>
>> but the /var/mmfs DIR is obviously damaged/empty .. what ever.. that's why you
>> see a message like this..
>> have you reinstalled that node / any backup/restore thing ?
>
> The internal RAID controller died a horrid death and basically took
> all the OS partitions with it.  So the node was just sort of limping along,
> where the mmfsd process was still coping because it wasn't doing any
> I/O to the OS partitions - but 'ssh bad-node mmshutdown' wouldn't work
> because that requires accessing stuff in /var.
>
> At that point, it starts getting tempting to just use ipmitool from
> another node to power the comatose one down - but that often causes
> a cascade of other issues while things are stuck waiting for timeouts.
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776