[gpfsug-discuss] data integrity documentation

Sven Oehme oehmes at gmail.com
Wed Aug 2 22:05:18 BST 2017


before i answer the rest of your questions, can you share what version of
GPFS exactly you are on mmfsadm dump version would be best source for that.
if you have 2 inodes and you know the exact address of where they are
stored on disk one could 'dd' them of the disk and compare if they are
really equal.
we only support checksums when you use GNR based systems, they cover
network as well as Disk side for that.
the nsdchecksum code you refer to is the one i mentioned above thats only
supported with GNR at least i am not aware that we ever claimed it to be
supported outside of it, but i can check that.

sven

On Wed, Aug 2, 2017 at 12:20 PM Stijn De Weirdt <stijn.deweirdt at ugent.be>
wrote:

> hi sven,
>
> the data is not corrupted. mmfsck compares 2 inodes, says they don't
> match, but checking the data with tbdbfs reveals they are equal.
> (one replica has to be fetched over the network; the nsds cannot access
> all disks)
>
> with some nsdChksum... settings we get during this mmfsck a lot of
> "Encountered XYZ checksum errors on network I/O to NSD Client disk"
>
> ibm support says these are hardware issues, but wrt to mmfsck false
> positives.
>
> anyway, our current question is: if these are hardware issues, is there
> anything in gpfs client->nsd (on the network side) that would detect
> such errors. ie can we trust the data (and metadata).
> i was under the impression that client to disk is not covered, but i
> assumed that at least client to nsd (the network part) was checksummed.
>
> stijn
>
>
> On 08/02/2017 09:10 PM, Sven Oehme wrote:
> > ok, i think i understand now, the data was already corrupted. the config
> > change i proposed only prevents a potentially known future on the wire
> > corruption, this will not fix something that made it to the disk already.
> >
> > Sven
> >
> >
> >
> > On Wed, Aug 2, 2017 at 11:53 AM Stijn De Weirdt <stijn.deweirdt at ugent.be
> >
> > wrote:
> >
> >> yes ;)
> >>
> >> the system is in preproduction, so nothing that can't stopped/started in
> >> a few minutes (current setup has only 4 nsds, and no clients).
> >> mmfsck triggers the errors very early during inode replica compare.
> >>
> >>
> >> stijn
> >>
> >> On 08/02/2017 08:47 PM, Sven Oehme wrote:
> >>> How can you reproduce this so quick ?
> >>> Did you restart all daemons after that ?
> >>>
> >>> On Wed, Aug 2, 2017, 11:43 AM Stijn De Weirdt <stijn.deweirdt at ugent.be
> >
> >>> wrote:
> >>>
> >>>> hi sven,
> >>>>
> >>>>
> >>>>> the very first thing you should check is if you have this setting
> set :
> >>>> maybe the very first thing to check should be the faq/wiki that has
> this
> >>>> documented?
> >>>>
> >>>>>
> >>>>> mmlsconfig envVar
> >>>>>
> >>>>> envVar MLX4_POST_SEND_PREFER_BF 0 MLX4_USE_MUTEX 1 MLX5_SHUT_UP_BF 1
> >>>>> MLX5_USE_MUTEX 1
> >>>>>
> >>>>> if that doesn't come back the way above you need to set it :
> >>>>>
> >>>>> mmchconfig envVar="MLX4_POST_SEND_PREFER_BF=0 MLX5_SHUT_UP_BF=1
> >>>>> MLX5_USE_MUTEX=1 MLX4_USE_MUTEX=1"
> >>>> i just set this (wasn't set before), but problem is still present.
> >>>>
> >>>>>
> >>>>> there was a problem in the Mellanox FW in various versions that was
> >> never
> >>>>> completely addressed (bugs where found and fixed, but it was never
> >> fully
> >>>>> proven to be addressed) the above environment variables turn code on
> in
> >>>> the
> >>>>> mellanox driver that prevents this potential code path from being
> used
> >> to
> >>>>> begin with.
> >>>>>
> >>>>> in Spectrum Scale 4.2.4 (not yet released) we added a workaround in
> >> Scale
> >>>>> that even you don't set this variables the problem can't happen
> anymore
> >>>>> until then the only choice you have is the envVar above (which btw
> >> ships
> >>>> as
> >>>>> default on all ESS systems).
> >>>>>
> >>>>> you also should be on the latest available Mellanox FW & Drivers as
> not
> >>>> all
> >>>>> versions even have the code that is activated by the environment
> >>>> variables
> >>>>> above, i think at a minimum you need to be at 3.4 but i don't
> remember
> >>>> the
> >>>>> exact version. There had been multiple defects opened around this
> area,
> >>>> the
> >>>>> last one i remember was  :
> >>>> we run mlnx ofed 4.1, fw is not the latest, but we have edr cards from
> >>>> dell, and the fw is a bit behind. i'm trying to convince dell to make
> >>>> new one. mellanox used to allow to make your own, but they don't
> >> anymore.
> >>>>
> >>>>>
> >>>>> 00154843 : ESS ConnectX-3 performance issue - spinning on
> >>>> pthread_spin_lock
> >>>>>
> >>>>> you may ask your mellanox representative if they can get you access
> to
> >>>> this
> >>>>> defect. while it was found on ESS , means on PPC64 and with
> ConnectX-3
> >>>>> cards its a general issue that affects all cards and on intel as well
> >> as
> >>>>> Power.
> >>>> ok, thanks for this. maybe such a reference is enough for dell to
> update
> >>>> their firmware.
> >>>>
> >>>> stijn
> >>>>
> >>>>>
> >>>>> On Wed, Aug 2, 2017 at 8:58 AM Stijn De Weirdt <
> >> stijn.deweirdt at ugent.be>
> >>>>> wrote:
> >>>>>
> >>>>>> hi all,
> >>>>>>
> >>>>>> is there any documentation wrt data integrity in spectrum scale:
> >>>>>> assuming a crappy network, does gpfs garantee somehow that data
> >> written
> >>>>>> by client ends up safe in the nsd gpfs daemon; and similarly from
> the
> >>>>>> nsd gpfs daemon to disk.
> >>>>>>
> >>>>>> and wrt crappy network, what about rdma on crappy network? is it the
> >>>> same?
> >>>>>>
> >>>>>> (we are hunting down a crappy infiniband issue; ibm support says
> it's
> >>>>>> network issue; and we see no errors anywhere...)
> >>>>>>
> >>>>>> thanks a lot,
> >>>>>>
> >>>>>> stijn
> >>>>>> _______________________________________________
> >>>>>> gpfsug-discuss mailing list
> >>>>>> gpfsug-discuss at spectrumscale.org
> >>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> gpfsug-discuss mailing list
> >>>>> gpfsug-discuss at spectrumscale.org
> >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>>>>
> >>>> _______________________________________________
> >>>> gpfsug-discuss mailing list
> >>>> gpfsug-discuss at spectrumscale.org
> >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> gpfsug-discuss mailing list
> >>> gpfsug-discuss at spectrumscale.org
> >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>>
> >> _______________________________________________
> >> gpfsug-discuss mailing list
> >> gpfsug-discuss at spectrumscale.org
> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>
> >
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170802/5418f503/attachment.htm>


More information about the gpfsug-discuss mailing list