[gpfsug-discuss] DSS-G

Simon Thompson S.J.Thompson at bham.ac.uk
Fri Jan 18 16:02:48 GMT 2019


I have several. One of mine was shipped for customer rack (which happened to be an existing Lenovo rack anyway), the other was based on 3560m5 so cabled differently then anyway (and its now a franken DSS-G as we upgraded the servers to SR650 and added an SSD tray, but I have so much non-standard Lenovo config stuff in our systems ....)

If you bond the LOM ports together then you can't use the XCC in shared mode. But the installer scripts will make it shared when you reinstall/upgrade. Well, it can half work in some cases depending on how you have your switch connected. For example we set the switch to fail back to non-bond mode (relatively common now), which is find when the OS is not booted, you can talk to XCC. But as soon as the OS boots and it bonds, the switch port turns into a bond/trunk port and BAM, you can no longer talk to the XCC port.

We have an xcat post script to put it back to being dedicated on the XCC port. So during install you lose access for a little while whilst the Lenovo script runs before my script puts it back again.

And if you read the upgrade guide, then it tells you to unplug the SAS ports before doing the reinstall (OK I haven't checked the 2.2a upgrade guide, but it always did). HOWEVER, the xcat template for DSS-G should also black list the SAS driver to prevent it seeing the attached JBOD storage. AND GPFS now writes proper GPT headers as well to the disks which the installer should then leave alone. (But yes, haven't we all done an install and wiped the disk headers ... GPFS works great until you try to mount the file-system sometime later)

On the needing to reinstall ... I agree I don't like the reinstall to upgrade between releases, but if you look what it's doing it sorta half makes sense. For example it force flashes an exact validated firmware onto the SAS cards and forces the port config etc onto the card to being in a known current state. I don't like it, but I see why it's done like that. We have in the past picked the relevant bits out (e.g. disk firmware and GPFS packages), and done just those, THIS IS NOT SUPPORTED, but we did pick it apart to see what had changed.

If you go to 2.2a as well, the gui is now moved out (it was a bad idea to install on the DSS-G nodes anyway I'm sure), and the pmcollector package magically doesn't get installed either on the DSS-G nodes.

Oh AND, the LOM ports ... if you upgrade to DSS-G 2.2a, that will flash the firmware to Intel 4.0 release for the X722. And that doesn't work if you have Mellanox Ethernet switches running Cumulus.  (we proved it was the firmware by upgrading another SR650 to the latest firmware and suddenly it no longer works) - you won't get a link up, even at PXE time so not a driver issue. And if you have a VDX switch you need another workaround ...

Simon

On 18/01/2019, 15:38, "gpfsug-discuss-bounces at spectrumscale.org on behalf of Jonathan Buzzard" <gpfsug-discuss-bounces at spectrumscale.org on behalf of jonathan.buzzard at strath.ac.uk> wrote:

    
    Anyone out their with a DSS-G using SR650 servers?
    
    We have one and after some hassle we have finally got the access to the
    software downloads and I have been reading through the documentation to
    familiarize myself with the upgrade procedure.
    
    Skipping over the shear madness of that which appears to involved doing
    a complete netboot reisntall of the nodes for every upgrade, it looks
    like we have wrong hardware. It all came in a Lenovo rack with factory
    cabling so one assumes it would be correct.
    
    However the "Manufactoring Preload Procedure" document says
    
        The DSS-G installation scripts assume that IPMI access to the
        servers is set up through the first regular 1GbE Ethernet port
        of the server (marked with a green star in figure 21) in shared
        mode, not through the dedicated IPMI port under the first three
        PCIe slots of the SR650 server’s back, and not on the lower left
        side of the x3650 M5 server’s back.
    
    Except our SR650's have 2x10GbE SFP+ LOM and the XCC is connected to
    the dedicated IPMI port. Oh great, reinstalling the OS for an update is
    already giving me the screaming heebie jeebies, but now my factory
    delivered setup is wrong. So in my book increased chance of the install
    procedure writing all over the disks during install and blowing away
    the NSD's. Last time I was involved in an net install of RHEL (well
    CentOS but makes little difference) onto a GPFS not with attached disks
    the installer wrote all over the NSD descriptors and destroyed the file
    system.
    
    So before one plays war with Lenovo for shipping an unsupported
    configuration I was wondering how other DSS-G's with SR650's have come
    from the factory.
    
    JAB.
    
    -- 
    Jonathan A. Buzzard                         Tel: +44141-5483420
    HPC System Administrator, ARCHIE-WeSt.
    University of Strathclyde, John Anderson Building, Glasgow. G4 0NG
    
    
    
    _______________________________________________
    gpfsug-discuss mailing list
    gpfsug-discuss at spectrumscale.org
    http://gpfsug.org/mailman/listinfo/gpfsug-discuss
    



More information about the gpfsug-discuss mailing list