[gpfsug-discuss] DSS-G

Jonathan Buzzard jonathan.buzzard at strath.ac.uk
Fri Jan 18 17:14:52 GMT 2019


On Fri, 2019-01-18 at 16:02 +0000, Simon Thompson wrote:

[SNIP]

> 
> If you bond the LOM ports together then you can't use the XCC in
> shared mode. But the installer scripts will make it shared when you
> reinstall/upgrade. Well, it can half work in some cases depending on
> how you have your switch connected. For example we set the switch to
> fail back to non-bond mode (relatively common now), which is find
> when the OS is not booted, you can talk to XCC. But as soon as the OS
> boots and it bonds, the switch port turns into a bond/trunk port and
> BAM, you can no longer talk to the XCC port.

We don't have that issue :-) Currently there is nothing plugged into
the LOM because we are using the Mellanox ConnectX4 card for bonded
40Gbps Ethernet to carry the GPFS traffic in the main with one of the
ports on the two cards set to Infiniband so the storage can be mounted
on an old cluster which only has 1Gb Ethernet (new cluster uses 10GbE
networking to carry storage).

However we have a shortage of 10GbE ports and the documentation says it
should be 1GbE anyway, hence asking what Lenovo might have shipped to
other people, as we have a disparity between what has been shipped and
what the documentation says it should be like.

[SNIP]

> And if you read the upgrade guide, then it tells you to unplug the
> SAS ports before doing the reinstall (OK I haven't checked the 2.2a
> upgrade guide, but it always did).

Well the 2.2a documentation does not say anything about that :-) I had
basically decided however it was going to be necessary for safety
purposes. While I do have a full backup of the file system I don't want
to have to use it.

>  HOWEVER, the xcat template for DSS-G should also black list the SAS
> driver to prevent it seeing the attached JBOD storage. AND GPFS now
> writes proper GPT headers as well to the disks which the installer
> should then leave alone. (But yes, haven't we all done an install and
> wiped the disk headers ... GPFS works great until you try to mount
> the file-system sometime later)

Well I have never wiped my NSD's, just the numpty getting ready to
prepare the CentOS6 upgrade for the cluster forgot to unzone the
storage arrays (cluster had FC attached storage to all nodes for
performance reasons, back in the day 4Gb FC was a lot cheaper than
10GbE and 1GbE was not fast enough) and wiped it for me :-(

> On the needing to reinstall ... I agree I don't like the reinstall to
> upgrade between releases, but if you look what it's doing it sorta
> half makes sense. For example it force flashes an exact validated
> firmware onto the SAS cards and forces the port config etc onto the
> card to being in a known current state. I don't like it, but I see
> why it's done like that.

Except that does not require a reinstall of the OS to achieve.
Reinstalling from scratch for an update is complete madness IMHO.

> 
> If you go to 2.2a as well, the gui is now moved out (it was a bad
> idea to install on the DSS-G nodes anyway I'm sure), and the
> pmcollector package magically doesn't get installed either on the
> DSS-G nodes.
> 

Currently we don't have the GUI installed anywhere. I am not sure I
trust IBM yet to not change the GUI completely again to be bothered
getting it to work.

> Oh AND, the LOM ports ... if you upgrade to DSS-G 2.2a, that will
> flash the firmware to Intel 4.0 release for the X722. And that
> doesn't work if you have Mellanox Ethernet switches running
> Cumulus.  (we proved it was the firmware by upgrading another SR650
> to the latest firmware and suddenly it no longer works) - you won't
> get a link up, even at PXE time so not a driver issue. And if you
> have a VDX switch you need another workaround ...
> 

We have Lenovo switches, so hopefully Lenovo tested with their own
switches work ;-)

Mind you I get this running the dssgcktopology tool

    Warning: Unsupported configuration of odd number of enclosures detected.

Which nitwit wrote that script then? From the "Manufacturing Preload
Procedure" for 2.2a on page 9

    For the high density DSS models DSS-G210, DSS-G220, DSS-G240 and
    DSS-G260 with 3.5” NL-SAS disks (7.2k RPM), the DSS-G building
    block contains one, two, four or six Lenovo D3284 disk enclosures.

Right so what is it then? Because one enclosure which is clearly an odd
number of enclosures is allegedly an unsupported configuration
according to the tool, but supported according to the documentation!!!


JAB.

-- 
Jonathan A. Buzzard                         Tel: +44141-5483420
HPC System Administrator, ARCHIE-WeSt.
University of Strathclyde, John Anderson Building, Glasgow. G4 0NG






More information about the gpfsug-discuss mailing list