[gpfsug-discuss] gpfsug-discuss Digest, Vol 62, Issue 33
Aaron Knister
aaron.s.knister at nasa.gov
Thu Mar 16 14:43:47 GMT 2017
Perhaps an environment where one has OPA and IB fabrics. Taken from here
(https://www.ibm.com/support/knowledgecenter/en/STXKQY/gpfsclustersfaq.html):
RDMA is not supported on a node when both Mellanox HCAs and Intel
Omni-Path HFIs are enabled for RDMA.
The alternative being a situation where multiple IB fabrics exist that
require different OFED versions from each other (and most likely from
ESS) for support reasons (speaking from experience). That is to say if
$VENDOR supports OFED version X on an IB fabric, and ESS/GSS ships with
version Y and there's a problem on the IB fabric $VENDOR may point at
the different OFED version on the ESS/GSS and say they don't support it
and then one is in a bad spot.
-Aaron
On 3/16/17 9:50 AM, Jan-Frode Myklebust wrote:
> Why would you need a NSD protocol router when the NSD servers can have a
> mix of infiniband and ethernet adapters? F.ex. 4x EDR + 2x 100GbE per
> io-node in an ESS should give you lots of bandwidth for your common
> ethernet medium.
>
>
> -jf
>
> On Thu, Mar 16, 2017 at 1:52 AM, Aaron Knister <aaron.knister at gmail.com
> <mailto:aaron.knister at gmail.com>> wrote:
>
> *drags out soapbox*
>
> Sorry in advance for the rant, this is one of my huge pet peeves :)
>
> There are some serious blockers for GNR adoption in my environment.
> It drives me up a wall that the only way to get end to end checksums
> in GPFS is with vendor hardware lock-in. I find it infuriating.
> Lustre can do this for free with ZFS. Historically it has also
> offered various other features too like eating your data so I guess
> it's a tradeoff ;-) I believe that either GNR should be available
> for any hardware that passes a validation suite or GPFS should
> support checksums on non-GNR NSDs either by leveraging T10-PI
> information or by checksumming blocks/subblocks and storing that
> somewhere. I opened an RFE for this and it was rejected and I was
> effectively told to go use GNR/ESS, but well... can't do GNR.
>
> But lets say I could run GNR on any hardware of my choosing after
> perhaps paying some modest licensing fee and passing a hardware
> validation test there's another blocker for me. Because GPFS doesn't
> support anything like an LNet router I'm fairly limited on the
> number of high speed verbs rdma fabrics I can connect GNR to.
> Furthermore even if I had enough PCIe slots the configuration may
> not be supported (e.g. a site with an OPA and an IB fabric that
> would like to use rdma verbs on both). There could even be a
> situation where a vendor of an HPC solution requires a specific OFED
> version for support purposes that's not the version running on the
> GNR nodes. If an NSD protocol router were available I could perhaps
> use ethernet as a common medium to work around this.
>
> I'd really like IBM to *do* something about this situation but I've
> not gotten any traction on it so far.
>
> -Aaron
>
>
>
> On Wed, Mar 15, 2017 at 8:26 PM, Steve Duersch <duersch at us.ibm.com
> <mailto:duersch at us.ibm.com>> wrote:
>
> >>For me it's the protection against bitrot and added protection
> against silent data corruption
> GNR has this functionality. Right now that is available through
> ESS though. Not yet as software only.
>
> Steve Duersch
> Spectrum Scale
> 845-433-7902 <tel:(845)%20433-7902>
> IBM Poughkeepsie, New York
>
>
>
>
> gpfsug-discuss-bounces at spectrumscale.org
> <mailto:gpfsug-discuss-bounces at spectrumscale.org> wrote on
> 03/15/2017 10:25:59 AM:
>
>
> >
> > Message: 6
> > Date: Wed, 15 Mar 2017 14:25:41 +0000
> > From: "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
> > To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org
> <mailto:gpfsug-discuss at spectrumscale.org>>
> > Subject: Re: [gpfsug-discuss] mmcrfs issue
> > Message-ID: <F5D928E7-5ADF-4491-A8FB-AF3885E9A8A3 at vanderbilt.edu
> <mailto:F5D928E7-5ADF-4491-A8FB-AF3885E9A8A3 at vanderbilt.edu>>
> > Content-Type: text/plain; charset="utf-8"
> >
> > Hi All,
> >
> > Since I started this thread I guess I should chime in, too ? for us
> > it was simply that we were testing a device that did not have
> > hardware RAID controllers and we were wanting to implement something
> > roughly equivalent to RAID 6 LUNs.
> >
> > Kevin
> >
> > > On Mar 14, 2017, at 5:16 PM, Aaron Knister <aaron.s.knister at nasa.gov <mailto:aaron.s.knister at nasa.gov>> wrote:
> > >
> > > For me it's the protection against bitrot and added protection
> > against silent data corruption and in theory the write caching
> > offered by adding log devices that could help with small random
> > writes (although there are other problems with ZFS + synchronous
> > workloads that stop this from actually materializing).
> > >
> > > -Aaron
> > >
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org <http://spectrumscale.org>
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> <http://gpfsug.org/mailman/listinfo/gpfsug-discuss>
>
>
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
More information about the gpfsug-discuss
mailing list