[gpfsug-discuss] gpfsug-discuss Digest, Vol 62, Issue 33

Aaron Knister aaron.knister at gmail.com
Thu Mar 16 00:52:58 GMT 2017


*drags out soapbox*

Sorry in advance for the rant, this is one of my huge pet peeves :)

There are some serious blockers for GNR adoption in my environment. It
drives me up a wall that the only way to get end to end checksums in GPFS
is with vendor hardware lock-in. I find it infuriating. Lustre can do this
for free with ZFS. Historically it has also offered various other features
too like eating your data so I guess it's a tradeoff ;-) I believe that
either GNR should be available for any hardware that passes a validation
suite or GPFS should support checksums on non-GNR NSDs either by leveraging
T10-PI information or by checksumming blocks/subblocks and storing that
somewhere. I opened an RFE for this and it was rejected and I was
effectively told to go use GNR/ESS, but well... can't do GNR.

But lets say I could run GNR on any hardware of my choosing after perhaps
paying some modest licensing fee and passing a hardware validation test
there's another blocker for me. Because GPFS doesn't support anything like
an LNet router I'm fairly limited on the number of high speed verbs rdma
fabrics I can connect GNR to. Furthermore even if I had enough PCIe slots
the configuration may not be supported (e.g. a site with an OPA and an IB
fabric that would like to use rdma verbs on both). There could even be a
situation where a vendor of an HPC solution requires a specific OFED
version for support purposes that's not the version running on the GNR
nodes. If an NSD protocol router were available I could perhaps use
ethernet as a common medium to work around this.

I'd really like IBM to *do* something about this situation but I've not
gotten any traction on it so far.

-Aaron



On Wed, Mar 15, 2017 at 8:26 PM, Steve Duersch <duersch at us.ibm.com> wrote:

> >>For me it's the protection against bitrot and added protection against
> silent data corruption
> GNR has this functionality. Right now that is available through ESS
> though. Not yet as software only.
>
> Steve Duersch
> Spectrum Scale
> 845-433-7902 <(845)%20433-7902>
> IBM Poughkeepsie, New York
>
>
>
>
> gpfsug-discuss-bounces at spectrumscale.org wrote on 03/15/2017 10:25:59 AM:
>
>
> >
> > Message: 6
> > Date: Wed, 15 Mar 2017 14:25:41 +0000
> > From: "Buterbaugh, Kevin L" <Kevin.Buterbaugh at Vanderbilt.Edu>
> > To: gpfsug main discussion list <gpfsug-discuss at spectrumscale.org>
> > Subject: Re: [gpfsug-discuss] mmcrfs issue
> > Message-ID: <F5D928E7-5ADF-4491-A8FB-AF3885E9A8A3 at vanderbilt.edu>
> > Content-Type: text/plain; charset="utf-8"
> >
> > Hi All,
> >
> > Since I started this thread I guess I should chime in, too ? for us
> > it was simply that we were testing a device that did not have
> > hardware RAID controllers and we were wanting to implement something
> > roughly equivalent to RAID 6 LUNs.
> >
> > Kevin
> >
> > > On Mar 14, 2017, at 5:16 PM, Aaron Knister <aaron.s.knister at nasa.gov>
> wrote:
> > >
> > > For me it's the protection against bitrot and added protection
> > against silent data corruption and in theory the write caching
> > offered by adding log devices that could help with small random
> > writes (although there are other problems with ZFS + synchronous
> > workloads that stop this from actually materializing).
> > >
> > > -Aaron
> > >
>
>
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170315/4f0de87b/attachment.htm>


More information about the gpfsug-discuss mailing list