[gpfsug-discuss] [Replicated and non replicated data

Mon Apr 16 09:42:04 BST 2018

Yeah that did it, it was set to the default value of “no”.

What exactly does “no” mean as opposed to “yes”? The docs
https://www.ibm.com/support/knowledgecenter/en/STXKQY_5.0.0/com.ibm.spectrum.scale.v5r00.doc/bl1adm_tuningguide.htm

Aren’t very forthcoming on this …

(note it looks like we also have to set this in multi-cluster environments in client clusters as well)

Simon

From: "Robert.Oesterlin at nuance.com" <Robert.Oesterlin at nuance.com>
Date: Friday, 13 April 2018 at 21:17
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Cc: "Simon Thompson (IT Research Support)" <S.J.Thompson at bham.ac.uk>
Subject: Re: [Replicated and non replicated data

Add:

unmountOnDiskFail=meta

To your config. You can add it with “-I” to have it take effect w/o reboot.

Bob Oesterlin
Sr Principal Storage Engineer, Nuance

From: <gpfsug-discuss-bounces at spectrumscale.org> on behalf of "Simon Thompson (IT Research Support)" <S.J.Thompson at bham.ac.uk>
Reply-To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
Date: Friday, April 13, 2018 at 3:06 PM
To: "gpfsug-discuss at spectrumscale.org" <gpfsug-discuss at spectrumscale.org>
Subject: [EXTERNAL] [gpfsug-discuss] Replicated and non replicated data

I have a question about file-systems with replicated an non replicated data.

We have a file-system where metadata is set to copies=2 and data copies=2, we then use a placement policy to selectively replicate some data only once based on file-set. We also place the non-replicated data into a specific pool (6tnlsas) to ensure we know where it is placed.

My understanding was that in doing this, if we took the disks with the non replicated data offline, we’d still have the FS available for users as the metadata is replicated. Sure accessing a non-replicated data file would give an IO error, but the rest of the FS should be up.

We had a situation today where we wanted to take stg01 offline today, so tried using mmchdisk stop -d …. Once we got to about disk stg01-01_12_12, GPFS would refuse to stop any more disks and complain about too many disks, similarly if we shutdown the NSD servers hosting the disks, the filesystem would have an SGPanic and force unmount.

First, am I correct in thinking that a FS with non-replicated data, but replicated metadata should still be accessible (not the non-replicated data) when the LUNS hosting it are down?

If so, any suggestions why my FS is panic-ing when we take down the one set of disks?

I thought at first we had some non-replicated metadata, tried a mmrestripefs -R –metadata-only to force it to ensure 2 replicas, but this didn’t help.

Running 5.0.0.2 on the NSD server nodes.

(First time we went round this we didn’t have a FS descriptor disk, but you can see below that we added this)

Thanks

Simon

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20180416/25ac35e8/attachment.htm>