[gpfsug-discuss] data integrity documentation

Edward Wahl ewahl at osc.edu
Wed Aug 2 21:11:53 BST 2017


What version of GPFS?  Are you generating a patch file?

Try using this before your mmfsck:

mmdsh -N <nsdnodes|all> mmfsadm test fsck usePatchQueue 0

my notes say all, but I would have only had NSD nodes up at the time.
Supposedly the mmfsck mess in 4.1 and 4.2.x was fixed in 4.2.2.3. 
I won't know for sure until late August.

Ed


On Wed, 2 Aug 2017 21:20:14 +0200
Stijn De Weirdt <stijn.deweirdt at ugent.be> wrote:

> hi sven,
> 
> the data is not corrupted. mmfsck compares 2 inodes, says they don't
> match, but checking the data with tbdbfs reveals they are equal.
> (one replica has to be fetched over the network; the nsds cannot access
> all disks)
> 
> with some nsdChksum... settings we get during this mmfsck a lot of
> "Encountered XYZ checksum errors on network I/O to NSD Client disk"
> 
> ibm support says these are hardware issues, but wrt to mmfsck false
> positives.
> 
> anyway, our current question is: if these are hardware issues, is there
> anything in gpfs client->nsd (on the network side) that would detect
> such errors. ie can we trust the data (and metadata).
> i was under the impression that client to disk is not covered, but i
> assumed that at least client to nsd (the network part) was checksummed.
> 
> stijn
> 
> 
> On 08/02/2017 09:10 PM, Sven Oehme wrote:
> > ok, i think i understand now, the data was already corrupted. the config
> > change i proposed only prevents a potentially known future on the wire
> > corruption, this will not fix something that made it to the disk already.
> > 
> > Sven
> > 
> > 
> > 
> > On Wed, Aug 2, 2017 at 11:53 AM Stijn De Weirdt <stijn.deweirdt at ugent.be>
> > wrote:
> >   
> >> yes ;)
> >>
> >> the system is in preproduction, so nothing that can't stopped/started in
> >> a few minutes (current setup has only 4 nsds, and no clients).
> >> mmfsck triggers the errors very early during inode replica compare.
> >>
> >>
> >> stijn
> >>
> >> On 08/02/2017 08:47 PM, Sven Oehme wrote:  
> >>> How can you reproduce this so quick ?
> >>> Did you restart all daemons after that ?
> >>>
> >>> On Wed, Aug 2, 2017, 11:43 AM Stijn De Weirdt <stijn.deweirdt at ugent.be>
> >>> wrote:
> >>>  
> >>>> hi sven,
> >>>>
> >>>>  
> >>>>> the very first thing you should check is if you have this setting
> >>>>> set :  
> >>>> maybe the very first thing to check should be the faq/wiki that has this
> >>>> documented?
> >>>>  
> >>>>>
> >>>>> mmlsconfig envVar
> >>>>>
> >>>>> envVar MLX4_POST_SEND_PREFER_BF 0 MLX4_USE_MUTEX 1 MLX5_SHUT_UP_BF 1
> >>>>> MLX5_USE_MUTEX 1
> >>>>>
> >>>>> if that doesn't come back the way above you need to set it :
> >>>>>
> >>>>> mmchconfig envVar="MLX4_POST_SEND_PREFER_BF=0 MLX5_SHUT_UP_BF=1
> >>>>> MLX5_USE_MUTEX=1 MLX4_USE_MUTEX=1"  
> >>>> i just set this (wasn't set before), but problem is still present.
> >>>>  
> >>>>>
> >>>>> there was a problem in the Mellanox FW in various versions that was  
> >> never  
> >>>>> completely addressed (bugs where found and fixed, but it was never  
> >> fully  
> >>>>> proven to be addressed) the above environment variables turn code on
> >>>>> in  
> >>>> the  
> >>>>> mellanox driver that prevents this potential code path from being used  
> >> to  
> >>>>> begin with.
> >>>>>
> >>>>> in Spectrum Scale 4.2.4 (not yet released) we added a workaround in  
> >> Scale  
> >>>>> that even you don't set this variables the problem can't happen anymore
> >>>>> until then the only choice you have is the envVar above (which btw  
> >> ships  
> >>>> as  
> >>>>> default on all ESS systems).
> >>>>>
> >>>>> you also should be on the latest available Mellanox FW & Drivers as
> >>>>> not  
> >>>> all  
> >>>>> versions even have the code that is activated by the environment  
> >>>> variables  
> >>>>> above, i think at a minimum you need to be at 3.4 but i don't remember  
> >>>> the  
> >>>>> exact version. There had been multiple defects opened around this
> >>>>> area,  
> >>>> the  
> >>>>> last one i remember was  :  
> >>>> we run mlnx ofed 4.1, fw is not the latest, but we have edr cards from
> >>>> dell, and the fw is a bit behind. i'm trying to convince dell to make
> >>>> new one. mellanox used to allow to make your own, but they don't  
> >> anymore.  
> >>>>  
> >>>>>
> >>>>> 00154843 : ESS ConnectX-3 performance issue - spinning on  
> >>>> pthread_spin_lock  
> >>>>>
> >>>>> you may ask your mellanox representative if they can get you access to  
> >>>> this  
> >>>>> defect. while it was found on ESS , means on PPC64 and with ConnectX-3
> >>>>> cards its a general issue that affects all cards and on intel as well  
> >> as  
> >>>>> Power.  
> >>>> ok, thanks for this. maybe such a reference is enough for dell to update
> >>>> their firmware.
> >>>>
> >>>> stijn
> >>>>  
> >>>>>
> >>>>> On Wed, Aug 2, 2017 at 8:58 AM Stijn De Weirdt <  
> >> stijn.deweirdt at ugent.be>  
> >>>>> wrote:
> >>>>>  
> >>>>>> hi all,
> >>>>>>
> >>>>>> is there any documentation wrt data integrity in spectrum scale:
> >>>>>> assuming a crappy network, does gpfs garantee somehow that data  
> >> written  
> >>>>>> by client ends up safe in the nsd gpfs daemon; and similarly from the
> >>>>>> nsd gpfs daemon to disk.
> >>>>>>
> >>>>>> and wrt crappy network, what about rdma on crappy network? is it the  
> >>>> same?  
> >>>>>>
> >>>>>> (we are hunting down a crappy infiniband issue; ibm support says it's
> >>>>>> network issue; and we see no errors anywhere...)
> >>>>>>
> >>>>>> thanks a lot,
> >>>>>>
> >>>>>> stijn
> >>>>>> _______________________________________________
> >>>>>> gpfsug-discuss mailing list
> >>>>>> gpfsug-discuss at spectrumscale.org
> >>>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>>>>>  
> >>>>>
> >>>>>
> >>>>>
> >>>>> _______________________________________________
> >>>>> gpfsug-discuss mailing list
> >>>>> gpfsug-discuss at spectrumscale.org
> >>>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>>>>  
> >>>> _______________________________________________
> >>>> gpfsug-discuss mailing list
> >>>> gpfsug-discuss at spectrumscale.org
> >>>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>>>  
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> gpfsug-discuss mailing list
> >>> gpfsug-discuss at spectrumscale.org
> >>> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>>  
> >> _______________________________________________
> >> gpfsug-discuss mailing list
> >> gpfsug-discuss at spectrumscale.org
> >> http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >>  
> > 
> > 
> > 
> > _______________________________________________
> > gpfsug-discuss mailing list
> > gpfsug-discuss at spectrumscale.org
> > http://gpfsug.org/mailman/listinfo/gpfsug-discuss
> >   
> _______________________________________________
> gpfsug-discuss mailing list
> gpfsug-discuss at spectrumscale.org
> http://gpfsug.org/mailman/listinfo/gpfsug-discuss



-- 

Ed Wahl
Ohio Supercomputer Center
614-292-9302



More information about the gpfsug-discuss mailing list