[gpfsug-discuss] data integrity documentation

Wed Aug 2 22:23:44 BST 2017

ok, you can't be any newer that that. i just wonder why you have 512b
inodes if this is a new system ?
are this raw disks in this setup or raid controllers ? whats the disk
sector size and how was the filesystem created (mmlsfs FSNAME would show
answer to the last question)

on the tsdbfs i am not sure if it gave wrong results, but it would be worth
a test to see whats actually on the disk .

you are correct that GNR extends this to the disk, but the network part is
covered by the nsdchecksums you turned on
when you enable the not to be named checksum parameter do you actually
still get an error from fsck ?

sven

On Wed, Aug 2, 2017 at 2:14 PM Stijn De Weirdt <stijn.deweirdt at ugent.be>
wrote:

> hi sven,
>
> > before i answer the rest of your questions, can you share what version of
> > GPFS exactly you are on mmfsadm dump version would be best source for
> that.
> it returns
> Build branch "4.2.3.3 ".
>
> > if you have 2 inodes and you know the exact address of where they are
> > stored on disk one could 'dd' them of the disk and compare if they are
> > really equal.
> ok, i can try that later. are you suggesting that the "tsdbfs comp"
> might gave wrong results? because we ran that and got eg
>
> > # tsdbfs somefs comp 7:5137408 25:221785088 1024
> > Comparing 1024 sectors at 7:5137408 = 0x7:4E6400 and 25:221785088 =
> 0x19:D382C00:
> >   All sectors identical
>
>
> > we only support checksums when you use GNR based systems, they cover
> > network as well as Disk side for that.
> > the nsdchecksum code you refer to is the one i mentioned above thats only
> > supported with GNR at least i am not aware that we ever claimed it to be
> > supported outside of it, but i can check that.
> ok, maybe i'm a bit consfused. we have a GNR too, but it's not this one,
> and they are not in the same gpfs cluster.
>
> i thought the GNR extended the checksumming to disk, and that it was
> already there for the network part. thanks for clearing this up. but
> that is worse then i thought...
>
> stijn
>
> >
> > sven
> >
> > On Wed, Aug 2, 2017 at 12:20 PM Stijn De Weirdt <stijn.deweirdt at ugent.be
> >
> > wrote:
> >
> >> hi sven,
> >>
> >> the data is not corrupted. mmfsck compares 2 inodes, says they don't
> >> match, but checking the data with tbdbfs reveals they are equal.
> >> (one replica has to be fetched over the network; the nsds cannot access
> >> all disks)
> >>
> >> with some nsdChksum... settings we get during this mmfsck a lot of
> >> "Encountered XYZ checksum errors on network I/O to NSD Client disk"
> >>
> >> ibm support says these are hardware issues, but wrt to mmfsck false
> >> positives.
> >>
> >> anyway, our current question is: if these are hardware issues, is there
> >> anything in gpfs client->nsd (on the network side) that would detect
> >> such errors. ie can we trust the data (and metadata).
> >> i was under the impression that client to disk is not covered, but i
> >> assumed that at least client to nsd (the network part) was checksummed.
> >>
> >> stijn
> >>
> >>
> >> On 08/02/2017 09:10 PM, Sven Oehme wrote:
> >>> ok, i think i understand now, the data was already corrupted. the
> config
> >>> change i proposed only prevents a potentially known future on the wire
> >>> corruption, this will not fix something that made it to the disk
> already.
> >>>
> >>> Sven
> >>>
> >>>
> >>>
> >>> On Wed, Aug 2, 2017 at 11:53 AM Stijn De Weirdt <
> stijn.deweirdt at ugent.be
> >>>
> >>> wrote:
> >>>
> >>>> yes ;)
> >>>>
> >>>> the system is in preproduction, so nothing that can't stopped/started
> in
> >>>> a few minutes (current setup has only 4 nsds, and no clients).
> >>>> mmfsck triggers the errors very early during inode replica compare.
> >>>>
> >>>>
> >>>> stijn
> >>>>
> >>>> On 08/02/2017 08:47 PM, Sven Oehme wrote:
> >>>>> How can you reproduce this so quick ?
> >>>>> Did you restart all daemons after that ?
> >>>>>
> >>>>> On Wed, Aug 2, 2017, 11:43 AM Stijn De Weirdt <
> stijn.deweirdt at ugent.be
> >>>
> >>>>> wrote:
> >>>>>
> >>>>>> hi sven,
> >>>>>>
> >>>>>>
> >>>>>>> the very first thing you should check is if you have this setting
> >> set :
> >>>>>> maybe the very first thing to check should be the faq/wiki that has
> >> this
> >>>>>> documented?
> >>>>>>
> >>>>>>>
> >>>>>>> mmlsconfig envVar
> >>>>>>>
> >>>>>>> envVar MLX4_POST_SEND_PREFER_BF 0 MLX4_USE_MUTEX 1 MLX5_SHUT_UP_BF
> 1
> >>>>>>> MLX5_USE_MUTEX 1
> >>>>>>>
> >>>>>>> if that doesn't come back the way above you need to set it :
> >>>>>>>
> >>>>>>> mmchconfig envVar="MLX4_POST_SEND_PREFER_BF=0 MLX5_SHUT_UP_BF=1
> >>>>>>> MLX5_USE_MUTEX=1 MLX4_USE_MUTEX=1"
> >>>>>> i just set this (wasn't set before), but problem is still present.
> >>>>>>
> >>>>>>>
> >>>>>>> there was a problem in the Mellanox FW in various versions that was
> >>>> never
> >>>>>>> completely addressed (bugs where found and fixed, but it was never
> >>>> fully
> >>>>>>> proven to be addressed) the above environment variables turn code
> on
> >> in
> >>>>>> the
> >>>>>>> mellanox driver that prevents this potential code path from being
> >> used
> >>>> to
> >>>>>>> begin with.
> >>>>>>>
> >>>>>>> in Spectrum Scale 4.2.4 (not yet released) we added a workaround in
> >>>> Scale
> >>>>>>> that even you don't set this variables the problem can't happen
> >> anymore
> >>>>>>> until then the only choice you have is the envVar above (which btw
> >>>> ships
> >>>>>> as
> >>>>>>> default on all ESS systems).
> >>>>>>>
> >>>>>>> you also should be on the latest available Mellanox FW & Drivers as
> >> not
> >>>>>> all
> >>>>>>> versions even have the code that is activated by the environment
> >>>>>> variables
> >>>>>>> above, i think at a minimum you need to be at 3.4 but i don't
> >> remember
> >>>>>> the
> >>>>>>> exact version. There had been multiple defects opened around this
> >> area,
> >>>>>> the
> >>>>>>> last one i remember was  :
> >>>>>> we run mlnx ofed 4.1, fw is not the latest, but we have edr cards
> from
> >>>>>> dell, and the fw is a bit behind. i'm trying to convince dell to
> make
> >>>>>> new one. mellanox used to allow to make your own, but they don't
> >>>> anymore.
> >>>>>>
> >>>>>>>
> >>>>>>> 00154843 : ESS ConnectX-3 performance issue - spinning on
> >>>>>> pthread_spin_lock
> >>>>>>>
> >>>>>>> you may ask your mellanox representative if they can get you access
> >> to
> >>>>>> this
> >>>>>>> defect. while it was found on ESS , means on PPC64 and with
> >> ConnectX-3
> >>>>>>> cards its a general issue that affects all cards and on intel as
> well
> >>>> as
> >>>>>>> Power.
> >>>>>> ok, thanks for this. maybe such a reference is enough for dell to
> >> update
> >>>>>> their firmware.
> >>>>>>
> >>>>>> stijn
> >>>>>>
> >>>>>>>
> >>>>>>> On Wed, Aug 2, 2017 at 8:58 AM Stijn De Weirdt <
> >>>> stijn.deweirdt at ugent.be>
> >>>>>>> wrote:
> >>>>>>>
> >>>>>>>> hi all,
> >>>>>>>>
> >>>>>>>> is there any documentation wrt data integrity in spectrum scale:
> >>>>>>>> assuming a crappy network, does gpfs garantee somehow that data
> >>>> written
> >>>>>>>> by client ends up safe in the nsd gpfs daemon; and similarly from
> >> the
> >>>>>>>> nsd gpfs daemon to disk.
> >>>>>>>>
> >>>>>>>> and wrt crappy network, what about rdma on crappy network? is it
> the
> >>>>>> same?
> >>>>>>>>
> >>>>>>>> (we are hunting down a crappy infiniband issue; ibm support says
> >> it's
> >>>>>>>> network issue; and we see no errors anywhere...)
> >>>>>>>>
> >>>>>>>> thanks a lot,
> >>>>>>>>
> >>>>>>>> stijn
> >>>>>>>> _______________________________________________
> >>>>>>>> gpfsug-discuss mailing list
> >>>>>>>> gpfsug-discuss at spectrumscale.org
> >>>>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> gpfsug-discuss mailing list
> >>>>>>> gpfsug-discuss at spectrumscale.org
> >>>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>>>>>>
> >>>>>> _______________________________________________
> >>>>>> gpfsug-discuss mailing list
> >>>>>> gpfsug-discuss at spectrumscale.org
> >>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> gpfsug-discuss mailing list
> >>>>> gpfsug-discuss at spectrumscale.org
> >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>>>>
> >>>> _______________________________________________
> >>>> gpfsug-discuss mailing list
> >>>> gpfsug-discuss at spectrumscale.org
> >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> gpfsug-discuss mailing list
> >>> gpfsug-discuss at spectrumscale.org
> >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>>
> >> _______________________________________________
> >> gpfsug-discuss mailing list
> >> gpfsug-discuss at spectrumscale.org
> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>
> >
> >
> >
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://gpfsug.org/pipermail/gpfsug-discuss_gpfsug.org/attachments/20170802/cab3b8d9/attachment.htm>