[gpfsug-discuss] cross-cluster mounting different versions of gpfs

Jonathan Buzzard jonathan at buzzard.me.uk
Wed Mar 16 19:45:35 GMT 2016


On 16/03/16 19:06, Damir Krstic wrote:

[SNIP]

>
> In our case, however, even though we will upgrade our clients to 4.2
> (some gradually as pointed elsewhere in this conversation, and most in
> June), we have to be able to mount the new ESS filesystem on our compute
> cluster before the clients are upgraded.

What is preventing a gradual if not rapid upgrade of the compute clients 
now?

The usual approach is once you have verified the upgrade is to simply to 
disable the queues on all the nodes and as jobs finish you upgrade them 
as they become free.

Again because the usual approach is to have a maximum run time for jobs 
(that is jobs can't just run forever and will be culled if they run too 
long) you can achieve this piece meal upgrade in a relatively short 
period of time. Most places have a maximum run time of one to two weeks. 
So if you are within the norm this could be done by the end of the month.

It's basically the same procedure as you would use to say push a 
security update that required a reboot.

The really neat way is to script it up and then make it a job that you 
keep dumping in the queue till all nodes are updated :D

>
> It seems like, even though Sven is recommending against it, building a
> filesystem with --version flag is our only option. I guess we have
> another option, and that is to upgrade all our clients first, but we
> can't do that until June so I guess it's really not an option at this time.
>

I would add my voice to that. The "this feature is not available because 
you created the file system as version x.y.z" is likely to cause you 
problems at some point down the line. Certainly caused me headaches in 
the past.

> I hope this makes our constraints clear: mainly, without being able to
> take downtime on our compute cluster, we are forced to build a
> filesystem on ESS using --version flag.
>

Again there is or at least should not be *ANY* requirement for downtime 
of the compute cluster that the users will notice. Certainly nothing 
worse that nodes going down due to hardware failures or pushing urgent 
security patches.

Taking a different tack is it not possible for the ESS storage to be 
added to the existing files system? That is you get a bunch of NSD's on 
the disk with NSD servers, add them all to the existing cluster and then 
issue some "mmchdisk <device> suspend" on the existing disks followed by 
some "mmdeldisk <device>" and have the whole lot move over to the new 
storage in an a manner utterly transparent to the end users (well other 
than a performance impact)?

This approach certainly works (done it myself) but IBM might have placed 
restrictions on the ESS offering preventing you doing this while 
maintaining support that I am not familiar with. If there is I 
personally would see this a barrier to purchase of ESS but then I am old 
school when it comes to GPFS and not at all familiar with ESS.

JAB.

-- 
Jonathan A. Buzzard                 Email: jonathan (at) buzzard.me.uk
Fife, United Kingdom.



More information about the gpfsug-discuss mailing list