[gpfsug-discuss] mmchdisk suspend / stop

Edward Wahl ewahl at osc.edu
Thu Feb 8 18:33:32 GMT 2018


I'm with Richard on this one.   Sounds dubious to me.
Even older style stuff could start a new controller in a 'failed' or 'service' state and push firmware back in the 20th
century...  ;)      

Ed


On Thu, 8 Feb 2018 16:23:33 +0000
"Sobey, Richard A" <r.sobey at imperial.ac.uk> wrote:

> Sorry I can’t help… the only thing going round and round my head right now is
> why on earth the existing controller cannot push the required firmware to the
> new one when it comes online. Never heard of anything else! Feel free to name
> and shame so I can avoid 😊
> 
> Richard
> 
> From: gpfsug-discuss-bounces at spectrumscale.org
> [mailto:gpfsug-discuss-bounces at spectrumscale.org] On Behalf Of Buterbaugh,
> Kevin L Sent: 08 February 2018 16:00 To: gpfsug main discussion list
> <gpfsug-discuss at spectrumscale.org> Subject: [gpfsug-discuss] mmchdisk
> suspend / stop
> 
> Hi All,
> 
> We are in a bit of a difficult situation right now with one of our non-IBM
> hardware vendors (I know, I know, I KNOW - buy IBM hardware! <grin>) and are
> looking for some advice on how to deal with this unfortunate situation.
> 
> We have a non-IBM FC storage array with dual-“redundant” controllers.  One of
> those controllers is dead and the vendor is sending us a replacement.
> However, the replacement controller will have mis-matched firmware with the
> surviving controller and - long story short - the vendor says there is no way
> to resolve that without taking the storage array down for firmware upgrades.
> Needless to say there’s more to that story than what I’ve included here, but
> I won’t bore everyone with unnecessary details.
> 
> The storage array has 5 NSDs on it, but fortunately enough they are part of
> our “capacity” pool … i.e. the only way a file lands here is if an
> mmapplypolicy scan moved it there because the *access* time is greater than
> 90 days.  Filesystem data replication is set to one.
> 
> So … what I was wondering if I could do is to use mmchdisk to either suspend
> or (preferably) stop those NSDs, do the firmware upgrade, and resume the
> NSDs?  The problem I see is that suspend doesn’t stop I/O, it only prevents
> the allocation of new blocks … so, in theory, if a user suddenly decided to
> start using a file they hadn’t needed for 3 months then I’ve got a problem.
> Stopping all I/O to the disks is what I really want to do.  However,
> according to the mmchdisk man page stop cannot be used on a filesystem with
> replication set to one.
> 
> There’s over 250 TB of data on those 5 NSDs, so restriping off of them or
> setting replication to two are not options.
> 
> It is very unlikely that anyone would try to access a file on those NSDs
> during the hour or so I’d need to do the firmware upgrades, but how would
> GPFS itself react to those (suspended) disks going away for a while?  I’m
> thinking I could be OK if there was just a way to actually stop them rather
> than suspend them.  Any undocumented options to mmchdisk that I’m not aware
> of???
> 
> Are there other options - besides buying IBM hardware - that I am
> overlooking?  Thanks... —
> Kevin Buterbaugh - Senior System Administrator
> Vanderbilt University - Advanced Computing Center for Research and Education
> Kevin.Buterbaugh at vanderbilt.edu<mailto:Kevin.Buterbaugh at vanderbilt.edu> -
> (615)875-9633
> 
> 
> 



-- 

Ed Wahl
Ohio Supercomputer Center
614-292-9302



More information about the gpfsug-discuss mailing list