[gpfsug-discuss] gpfsug-discuss Digest, Vol 62, Issue 33

Aaron Knister aaron.s.knister at nasa.gov
Thu Mar 16 14:43:47 GMT 2017


Perhaps an environment where one has OPA and IB fabrics. Taken from here 
(https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html):

RDMA is not supported on a node when both Mellanox HCAs and Intel 
Omni-Path HFIs are enabled for RDMA.

The alternative being a situation where multiple IB fabrics exist that 
require different OFED versions from each other (and most likely from 
ESS) for support reasons (speaking from experience). That is to say if 
$VENDOR supports OFED version X on an IB fabric, and ESS/GSS ships with 
version Y and there's a problem on the IB fabric $VENDOR may point at 
the different OFED version on the ESS/GSS and say they don't support it 
and then one is in a bad spot.

-Aaron

On 3/16/17 9:50 AM, Jan-Frode Myklebust wrote:
> Why would you need a NSD protocol router when the NSD servers can have a
> mix of infiniband and ethernet adapters? F.ex. 4x EDR + 2x 100GbE per
> io-node in an ESS should give you lots of bandwidth for your common
> ethernet medium.
>
>
>   -jf
>
> On Thu, Mar 16, 2017 at 1:52 AM, Aaron Knister <aaron.knister at gmail.com
> <mailto:aaron.knister at gmail.com>> wrote:
>
>     *drags out soapbox*
>
>     Sorry in advance for the rant, this is one of my huge pet peeves :)
>
>     There are some serious blockers for GNR adoption in my environment.
>     It drives me up a wall that the only way to get end to end checksums
>     in GPFS is with vendor hardware lock-in. I find it infuriating.
>     Lustre can do this for free with ZFS. Historically it has also
>     offered various other features too like eating your data so I guess
>     it's a tradeoff ;-) I believe that either GNR should be available
>     for any hardware that passes a validation suite or GPFS should
>     support checksums on non-GNR NSDs either by leveraging T10-PI
>     information or by checksumming blocks/subblocks and storing that
>     somewhere. I opened an RFE for this and it was rejected and I was
>     effectively told to go use GNR/ESS, but well... can't do GNR.
>
>     But lets say I could run GNR on any hardware of my choosing after
>     perhaps paying some modest licensing fee and passing a hardware
>     validation test there's another blocker for me. Because GPFS doesn't
>     support anything like an LNet router I'm fairly limited on the
>     number of high speed verbs rdma fabrics I can connect GNR to.
>     Furthermore even if I had enough PCIe slots the configuration may
>     not be supported (e.g. a site with an OPA and an IB fabric that
>     would like to use rdma verbs on both). There could even be a
>     situation where a vendor of an HPC solution requires a specific OFED
>     version for support purposes that's not the version running on the
>     GNR nodes. If an NSD protocol router were available I could perhaps
>     use ethernet as a common medium to work around this.
>
>     I'd really like IBM to *do* something about this situation but I've
>     not gotten any traction on it so far.
>
>     -Aaron
>
>
>
>     On Wed, Mar 15, 2017 at 8:26 PM, Steve Duersch <duersch at us.ibm.com
>     <mailto:duersch at us.ibm.com>> wrote:
>
>         >>For me it's the protection against bitrot and added protection
>         against silent data corruption
>         GNR has this functionality. Right now that is available through
>         ESS though. Not yet as software only.
>
>         Steve Duersch
>         Spectrum Scale
>         845-433-7902 <tel:(845)%20433-7902>
>         IBM Poughkeepsie, New York
>
>
>
>
>         gpfsug-discuss-bounces at spectrumscale.org
>         <mailto:gpfsug-discuss-bounces at spectrumscale.org> wrote on
>         03/15/2017 10:25:59 AM:
>
>
>         >
>         > Message: 6
>         > Date: Wed, 15 Mar 2017 14:25:41 +0000
>         > From: "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
>         > To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org
>         <mailto:gpfsug-discuss at spectrumscale.org>>
>         > Subject: Re: [gpfsug-discuss] mmcrfs issue
>         > Message-ID: <F5D928E7-5ADF-4491-A8FB-AF3885E9A8A3 at vanderbilt.edu
>         <mailto:F5D928E7-5ADF-4491-A8FB-AF3885E9A8A3 at vanderbilt.edu>>
>         > Content-Type: text/plain; charset="utf-8"
>         >
>         > Hi All,
>         >
>         > Since I started this thread I guess I should chime in, too ? for us
>         > it was simply that we were testing a device that did not have
>         > hardware RAID controllers and we were wanting to implement something
>         > roughly equivalent to RAID 6 LUNs.
>         >
>         > Kevin
>         >
>         > > On Mar 14, 2017, at 5:16 PM, Aaron Knister <aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>> wrote:
>         > >
>         > > For me it's the protection against bitrot and added protection
>         > against silent data corruption and in theory the write caching
>         > offered by adding log devices that could help with small random
>         > writes (although there are other problems with ZFS + synchronous
>         > workloads that stop this from actually materializing).
>         > >
>         > > -Aaron
>         > >
>
>
>         _______________________________________________
>         gpfsug-discuss mailing list
>         gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>         http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>         <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>
>
>
>     _______________________________________________
>     gpfsug-discuss mailing list
>     gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
>     http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>     <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776



More information about the gpfsug-discuss mailing list